OK this one is for k8s for Google cloud. However, you can adjust it to any cloud vendor
I use zip file personally and pass the application name (in your case main.py) as the last input line like below
APPLICATION is your main.py. It does not need to be called main.py. It could be anything like testpython.py
CODE_DIRECTORY_CLOUD="gs://spark-on-k8s/codes" ## replace gs with s3
# zip needs to be done at root directory of code
zip -rq ${source_code}.zip ${source_code}
gsutil cp ${source_code}.zip $CODE_DIRECTORY_CLOUD ## replace gsutil with aws s3
gsutil cp /${source_code}/src/${APPLICATION} $CODE_DIRECTORY_CLOUD
your spark job
spark-submit --verbose \
--properties-file ${property_file} \
--master k8s://https://$KUBERNETES_MASTER_IP:443 \
--deploy-mode cluster \
--name $APPNAME \
--py-files $CODE_DIRECTORY_CLOUD/spark_on_gke.zip \
--conf spark.kubernetes.namespace=$NAMESPACE \
--conf spark.network.timeout=300 \
--conf spark.kubernetes.allocation.batch.size=3 \
--conf spark.kubernetes.allocation.batch.delay=1 \
--conf spark.kubernetes.driver.container.image=${IMAGEDRIVER} \
--conf spark.kubernetes.executor.container.image=${IMAGEDRIVER} \
--conf spark.kubernetes.driver.pod.name=$APPNAME \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-bq \
--conf spark.driver.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true" \
--conf spark.executor.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true" \
--conf spark.dynamicAllocation.enabled=true \
--conf spark.dynamicAllocation.shuffleTracking.enabled=true \
--conf spark.dynamicAllocation.shuffleTracking.timeout=20s \
--conf spark.dynamicAllocation.executorIdleTimeout=30s \
--conf spark.dynamicAllocation.cachedExecutorIdleTimeout=40s \
--conf spark.dynamicAllocation.minExecutors=0 \
--conf spark.dynamicAllocation.maxExecutors=20 \
--conf spark.driver.cores=3 \
--conf spark.executor.cores=3 \
--conf spark.driver.memory=1024m \
--conf spark.executor.memory=1024m \
$CODE_DIRECTORY_CLOUD/${APPLICATION}
HTH
Mich Talebzadeh,
Dad | Technologist | Solutions Architect | Engineer
London
United Kingdom
Mich Talebzadeh | Technologist | Data | Generative AI | Financial Fraud
London
United Kingdom
view my Linkedin profile
https://en.everybodywiki.com/Mich_Talebzadeh
Disclaimer: The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand expert opinions (Werner Von Braun)".