Databricks

source2sea · ‎04-25-2023

https://spark.apache.org/docs/latest/submitting-applications.html

mainly want to know if extra class path could be used or not when i submit a job

Anonymous · ‎04-26-2023

@min shi :

In Databricks, when you run a job, you are submitting a Spark application to run in the cluster. The deploy-mode that is used by default depends on the type of job you are running:

For interactive clusters, the deploy-mode is client. This means that the driver program runs on the client machine that submitted the job.

For job clusters, the deploy-mode is cluster. This means that the driver program runs on one of the worker nodes in the cluster.

Note that in Databricks, you do not need to explicitly specify the deploy-mode when submitting a job. Instead, you can simply use the spark-submit command provided by Databricks:

databricks jobs submit --jar <path to jar file> --class <main class> --args <args>

When you run this command, Databricks automatically handles the details of submitting the Spark application to the cluster and starting the driver program.

Regarding your question about using extra classpath when submitting a job, you can specify extra classpath using the --jars and --py-files options provided by Databricks spark-submit. Here's an example:

databricks jobs submit --jar <path to jar file> --class <main class> --args <args> --jars <comma-separated list of jars> --py-files <comma-separated list of Python files>

The --jars option is used to specify a comma-separated list of jars to be added to the classpath of the driver and executor JVMs. Similarly, the --py-files option is used to specify a comma-separated list of Python files to be added to the Python path of the driver and executor processes.

View solution in original post

Anonymous · ‎04-26-2023

@min shi :

In Databricks, when you run a job, you are submitting a Spark application to run in the cluster. The deploy-mode that is used by default depends on the type of job you are running:

For interactive clusters, the deploy-mode is client. This means that the driver program runs on the client machine that submitted the job.

For job clusters, the deploy-mode is cluster. This means that the driver program runs on one of the worker nodes in the cluster.

Note that in Databricks, you do not need to explicitly specify the deploy-mode when submitting a job. Instead, you can simply use the spark-submit command provided by Databricks:

databricks jobs submit --jar <path to jar file> --class <main class> --args <args>

When you run this command, Databricks automatically handles the details of submitting the Spark application to the cluster and starting the driver program.

Regarding your question about using extra classpath when submitting a job, you can specify extra classpath using the --jars and --py-files options provided by Databricks spark-submit. Here's an example:

databricks jobs submit --jar <path to jar file> --class <main class> --args <args> --jars <comma-separated list of jars> --py-files <comma-separated list of Python files>

The --jars option is used to specify a comma-separated list of jars to be added to the classpath of the driver and executor JVMs. Similarly, the --py-files option is used to specify a comma-separated list of Python files to be added to the Python path of the driver and executor processes.

Databricks

what mode is the deploy-mode when calling spark in databricks/

Registration now open! Databricks Data + AI Summit 2024

Meet DBRX, the New Standard for High-Quality LLMs

Data Warehousing in the Era of AI