cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

what mode is the deploy-mode when calling spark in databricks/

source2sea
Contributor

https://spark.apache.org/docs/latest/submitting-applications.html

mainly want to know if extra class path could be used or not when i submit a job

1 ACCEPTED SOLUTION

Accepted Solutions

Anonymous
Not applicable

@min shi​ :

In Databricks, when you run a job, you are submitting a Spark application to run in the cluster. The deploy-mode that is used by default depends on the type of job you are running:

For interactive clusters, the deploy-mode is client. This means that the driver program runs on the client machine that submitted the job.

For job clusters, the deploy-mode is cluster. This means that the driver program runs on one of the worker nodes in the cluster.

Note that in Databricks, you do not need to explicitly specify the deploy-mode when submitting a job. Instead, you can simply use the spark-submit command provided by Databricks:

databricks jobs submit --jar <path to jar file> --class <main class> --args <args>

When you run this command, Databricks automatically handles the details of submitting the Spark application to the cluster and starting the driver program.

Regarding your question about using extra classpath when submitting a job, you can specify extra classpath using the --jars and --py-files options provided by Databricks spark-submit. Here's an example:

databricks jobs submit --jar <path to jar file> --class <main class> --args <args> --jars <comma-separated list of jars> --py-files <comma-separated list of Python files>

The --jars option is used to specify a comma-separated list of jars to be added to the classpath of the driver and executor JVMs. Similarly, the --py-files option is used to specify a comma-separated list of Python files to be added to the Python path of the driver and executor processes.

View solution in original post

1 REPLY 1

Anonymous
Not applicable

@min shi​ :

In Databricks, when you run a job, you are submitting a Spark application to run in the cluster. The deploy-mode that is used by default depends on the type of job you are running:

For interactive clusters, the deploy-mode is client. This means that the driver program runs on the client machine that submitted the job.

For job clusters, the deploy-mode is cluster. This means that the driver program runs on one of the worker nodes in the cluster.

Note that in Databricks, you do not need to explicitly specify the deploy-mode when submitting a job. Instead, you can simply use the spark-submit command provided by Databricks:

databricks jobs submit --jar <path to jar file> --class <main class> --args <args>

When you run this command, Databricks automatically handles the details of submitting the Spark application to the cluster and starting the driver program.

Regarding your question about using extra classpath when submitting a job, you can specify extra classpath using the --jars and --py-files options provided by Databricks spark-submit. Here's an example:

databricks jobs submit --jar <path to jar file> --class <main class> --args <args> --jars <comma-separated list of jars> --py-files <comma-separated list of Python files>

The --jars option is used to specify a comma-separated list of jars to be added to the classpath of the driver and executor JVMs. Similarly, the --py-files option is used to specify a comma-separated list of Python files to be added to the Python path of the driver and executor processes.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.