cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Job Cluster in Databricks workflow

jainshasha
New Contributor II

Hi,

I have configured 20 different workflows in Databricks. All of them configured with job cluster with different name. All 20 workfldows scheduled to run at same time. But even configuring different job cluster in all of them they run sequentially waiting for cluster till it is available. I was expecting all fo them run parallely with their own job clusters. Why itis not happening ? What need to change for all them to run on its own as different cluster is been configured in each of them.

 

Thanks.

12 REPLIES 12

Kaniz
Community Manager
Community Manager

Hi @jainshasha,

Running multiple workflows in parallel with their own job clusters in Databricks can be achieved by following the right configuration.

Let’s explore some options:

  1. Shared Job Clusters:

  2. Orchestration Job:

  3. Rewrite/Reconfigure Jobs:

    • Depending on your use case, you might be able to rewrite or reconfigure your 20 workflows into a single job with multiple tasks.
    • Each task within the job can then run on its own cluster configuration.

Remember that Databricks is designed to optimize resource utilization while ensuring efficient execution. Choose the approach that best fits your specific requirements and workload.

If you need further assistance, feel free to ask! 😊

 

jainshasha
New Contributor II

Hi @Kaniz 

Thanks for the reply, regarding my query, my ask is to run 20 different workflows at a same time and they are independent of each other, hence i want all of them to start doing execution at the same time, thats why i tried to give different job cluster to each of them but when they schedule to run at the same time 19 of them keep of waiting till 1 workflow get completed whereas my expectation was databricks will start doing their execution at the same time and hence all of them will finish almost at the same time.
Isnt the Databricks or cloud provider cant launch Job cluster for each of the 20 workflows simultaneously ?

Can you share a screen with job configuration and job cluster configuration?
If you run 2 separate Jobs (workflow) at same time on different job cluster it should run in parallel unless you have those job cluster base on cluster pool or have some sort of dependency implemented.

you have limit of total Tasks running simultaneously = 1000 , so maybe worth checking it with your Workspace admin? 

jainshasha
New Contributor II

Hi @Kaniz 
Attaching the screenshots of 5 of the workflows which schedule at same time
Screenshot 2024-05-06 at 2.57.02 PM.pngScreenshot 2024-05-06 at 2.56.50 PM.pngScreenshot 2024-05-06 at 2.56.37 PM.pngScreenshot 2024-05-06 at 2.56.21 PM.pngScreenshot 2024-05-06 at 2.56.09 PM.png

Wojciech_BUK
Contributor III

HI @jainshasha 
i tried to replicate your problem but in my case i was able to run jobs in parallel
(the only difference is that i am running notebook from workspace, not from repo)

As you can see jobs did not started exactly same time but it run in parallel 

Wojciech_BUK_0-1714993733572.png

Wojciech_BUK_1-1714993888413.png

 

Can you send a screenshot from your Job runs page  and Job Compute page ? 
Are you using spot instances ?

 

Hi @Wojciech_BUK 

Are you using spot instances ?
How to check this ?


Attaching the screenshots
Screenshot 2024-05-06 at 7.13.40 PM.pngScreenshot 2024-05-06 at 7.10.37 PM.png

Wojciech_BUK
Contributor III

@jainshasha 
base on the screenshot you sent, looks like your jobs are starting at 12:30 and runs in parallel 

Why do you thin your jobs are waiting for clusters? 

 

@Wojciech_BUK 

because when all them start at 12:30 only one of them showing that circle sign which says running whereas others were showing pending for cluster sign...Also considering all of them r doing almost similar processing but none of them are finishing at the same time rather than they got finish one by one...thats makes me little curious if all them running parallely or not. Ideally of all them getting similar resources all should run within 10 mins ie 12:40 but it took 30 minutes to finish them all

Wojciech_BUK
Contributor III

@jainshasha base on information you have provided my assumption will be that you might be waiting for Cloud Provider (AWS) to provision VMs (Clusters) for you. 


Finding instances for new nodes means that Databricks is attempting to provision the AWS instances necessary. This will often take longer if A) the cluster is larger, or B) the cluster is a spot cluster C) Instance size is on high demand

I don't have AWS Databricks but you can find information if you are using spot instances somewhere in Cluster configuration, there is old article with old UI, but maybe it will help you to find info about if you are using spot or not:
https://www.databricks.com/blog/2016/10/25/running-apache-spark-clusters-with-spot-instances-in-data...

Of course if there is no strong dependency somewhere inside the code that one task is blocking another.

One more thing - because i am more from Azure than AWS - in Azure there is something like QUOTA on Azure Subscription that limits you how many VMs with certain size you can provision at one time. 
Maybe there is something like that in AWS that prevents you from starting more than "X" number of clusters at once. 

Just advice below:
You can also change your workflow to have one job with multiple tasks running in parallel and configure job cluster for one or for many tasks (reuse cluster), you can save  $$ because Databricks will run in parallel as many tasks as cluster can handle and you don't wait for cluster start time (you can provision bigger cluster an let it run in parallel) . 

Sorry @jainshasha  - i have no more ideas what may be the reason 😞 

emora
New Contributor II

Hello all,

Did you tried to configured the Advanced settings?

emora_0-1715068234044.png

You must configure this option to have concurrent runs for one workflow.

jainshasha
New Contributor II

Hi @emora 

Thansk for reply but this query is not regarding running concurrent run for the same workflow but rather this is regarding running different workflows concurrently or parallely.

 

@Wojciech_BUK 
I am using Google cloud for my Databricks, if you know any limitation around that for launching too many clusters at a time ?
Also as per you what should be the best way for launching clusters...obviously if i am launching 20 different clusters at a time it is taking lot of time in just launching (btw just one question if cluster launching takes lot of time does that amount of time also adding cost to me from cloud side and from databricks side ?) what is the best way...as cluster pool is not much suitable as in that also i have to keep running atleast one cluster all the time which eventually cost me more than alunching 20 clusters at a time

emora
New Contributor II

Honestly you shouldn't have any kind of limitation executing diferent workflows.

I did a test case in my Databricks and if you have your workflows with a job cluster your shouldn't have limitation. But I did all my test in Azure and just for you to know, all the resources that you need to create in your Databricks (I mean clusters) are related to a subscription in Azure in which you are creating the VM for the cluster specification. So maybe you must pay attetion to this related subscription (Don't know how Google works in this terms) to check if you have any kind of limitation creating VM for your clusters.