Parallelize spark jobs on the same cluster?

OliverLewis
New Contributor

Whats the best way to parallelize multiple spark jobs on the same cluster during a backfill?

ron_defreitas
Contributor

In the past I used direct multi-threaded orchestration inside of driver applications, but that was prior to Databricks supporting multi-task jobs.

If you create a job through the workflows tab, you can set up multiple notebooks, python, or jar tasks to run in parallel as well as configure a dependency graph between them if desired.

You can either orchestrate those jobs via separate clusters in a single job or share the resources of one or more clusters across different tasks.

You can do this without having to assign a schedule to the job.

Hope this helps!

Hi @Oliver Lewis​,

Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.