Running JAR jobs in parallel on a cluster
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-05-2024 02:34 AM
Hi everyone, I'm trying to find out if databricks has support for clusters which can scale out with more drivers to run new jobs in parallel. If not, then is there a work around for this? I've noticed that all-purpose and job compute clusters both feature only a single driver.
I'm trying to run my spark applications from a jar file passing different arguments to it on every run. I need the applications to be run in parallel and not sequentially or concurrently, this is because I have a pretty strict time constraint requirement. I also need auto-scaling support for the same reason.
I'm quite new to databricks and spark as well, would greatly appreciate anyone's input.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-07-2024 07:23 AM
Databricks does not support clusters with multiple drivers to run new jobs in parallel. Each Databricks cluster has a single driver node, allowing only one job at a time.
Workarounds for Achieving Parallel Job Execution:
1. Multiple Clusters:
- Create Multiple Job Clusters: Set up multiple clusters, each with its own driver node, to run different jobs in parallel. This lets you submit different Spark applications with varied arguments to separate clusters.
- Autoscaling Support: Configure these clusters with autoscaling to efficiently manage the workload. You can set a range for the number of workers, allowing Databricks to dynamically adjust resources based on job requirements.
2. Job Scheduling and Orchestration:
- Databricks Workflows: Utilize Databricks Workflows for scheduling and orchestrating multiple jobs. Define tasks with dependencies and run them in parallel where applicable.
- External Orchestration Tools: Use tools like Apache Airflow or Azure Data Factory to manage and run multiple Databricks jobs in parallel.

