cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Parallelize spark jobs on the same cluster?

OliverLewis
New Contributor

Whats the best way to parallelize multiple spark jobs on the same cluster during a backfill?

3 REPLIES 3

ron_defreitas
Contributor

In the past I used direct multi-threaded orchestration inside of driver applications, but that was prior to Databricks supporting multi-task jobs.

If you create a job through the workflows tab, you can set up multiple notebooks, python, or jar tasks to run in parallel as well as configure a dependency graph between them if desired.

You can either orchestrate those jobs via separate clusters in a single job or share the resources of one or more clusters across different tasks.

You can do this without having to assign a schedule to the job.

โ€‹

Hope this helps!

Kaniz_Fatma
Community Manager
Community Manager

Hi @Oliver Lewisโ€‹  , We havenโ€™t heard from you on the last response from @Ron DeFreitasโ€‹, and I checked back to see if his suggestions helped you. Or else, If you have any solution, please share it with the community as it can be helpful to others.

Hi @Oliver Lewisโ€‹,

Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group