cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Parallelize spark jobs on the same cluster?

OliverLewis
New Contributor

Whats the best way to parallelize multiple spark jobs on the same cluster during a backfill?

3 REPLIES 3

ron_defreitas
Contributor

In the past I used direct multi-threaded orchestration inside of driver applications, but that was prior to Databricks supporting multi-task jobs.

If you create a job through the workflows tab, you can set up multiple notebooks, python, or jar tasks to run in parallel as well as configure a dependency graph between them if desired.

You can either orchestrate those jobs via separate clusters in a single job or share the resources of one or more clusters across different tasks.

You can do this without having to assign a schedule to the job.

Hope this helps!

Kaniz
Community Manager
Community Manager

Hi @Oliver Lewis​  , We haven’t heard from you on the last response from @Ron DeFreitas​, and I checked back to see if his suggestions helped you. Or else, If you have any solution, please share it with the community as it can be helpful to others.

Hi @Oliver Lewis​,

Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.