cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Job Cluster in Databricks workflow

jainshasha
New Contributor III

Hi,

I have configured 20 different workflows in Databricks. All of them configured with job cluster with different name. All 20 workfldows scheduled to run at same time. But even configuring different job cluster in all of them they run sequentially waiting for cluster till it is available. I was expecting all fo them run parallely with their own job clusters. Why itis not happening ? What need to change for all them to run on its own as different cluster is been configured in each of them.

 

Thanks.

11 REPLIES 11

Hi @Retired_mod 

Thanks for the reply, regarding my query, my ask is to run 20 different workflows at a same time and they are independent of each other, hence i want all of them to start doing execution at the same time, thats why i tried to give different job cluster to each of them but when they schedule to run at the same time 19 of them keep of waiting till 1 workflow get completed whereas my expectation was databricks will start doing their execution at the same time and hence all of them will finish almost at the same time.
Isnt the Databricks or cloud provider cant launch Job cluster for each of the 20 workflows simultaneously ?

Wojciech_BUK
Valued Contributor III

Can you share a screen with job configuration and job cluster configuration?
If you run 2 separate Jobs (workflow) at same time on different job cluster it should run in parallel unless you have those job cluster base on cluster pool or have some sort of dependency implemented.

you have limit of total Tasks running simultaneously = 1000 , so maybe worth checking it with your Workspace admin? 

jainshasha
New Contributor III

Hi @Retired_mod 
Attaching the screenshots of 5 of the workflows which schedule at same time
Screenshot 2024-05-06 at 2.57.02 PM.pngScreenshot 2024-05-06 at 2.56.50 PM.pngScreenshot 2024-05-06 at 2.56.37 PM.pngScreenshot 2024-05-06 at 2.56.21 PM.pngScreenshot 2024-05-06 at 2.56.09 PM.png

Wojciech_BUK
Valued Contributor III

HI @jainshasha 
i tried to replicate your problem but in my case i was able to run jobs in parallel
(the only difference is that i am running notebook from workspace, not from repo)

As you can see jobs did not started exactly same time but it run in parallel 

Wojciech_BUK_0-1714993733572.png

Wojciech_BUK_1-1714993888413.png

 

Can you send a screenshot from your Job runs page  and Job Compute page ? 
Are you using spot instances ?

 

Hi @Wojciech_BUK 

Are you using spot instances ?
How to check this ?


Attaching the screenshots
Screenshot 2024-05-06 at 7.13.40 PM.pngScreenshot 2024-05-06 at 7.10.37 PM.png

Wojciech_BUK
Valued Contributor III

@jainshasha 
base on the screenshot you sent, looks like your jobs are starting at 12:30 and runs in parallel 

Why do you thin your jobs are waiting for clusters? 

 

@Wojciech_BUK 

because when all them start at 12:30 only one of them showing that circle sign which says running whereas others were showing pending for cluster sign...Also considering all of them r doing almost similar processing but none of them are finishing at the same time rather than they got finish one by one...thats makes me little curious if all them running parallely or not. Ideally of all them getting similar resources all should run within 10 mins ie 12:40 but it took 30 minutes to finish them all

Wojciech_BUK
Valued Contributor III

@jainshasha base on information you have provided my assumption will be that you might be waiting for Cloud Provider (AWS) to provision VMs (Clusters) for you. 


Finding instances for new nodes means that Databricks is attempting to provision the AWS instances necessary. This will often take longer if A) the cluster is larger, or B) the cluster is a spot cluster C) Instance size is on high demand

I don't have AWS Databricks but you can find information if you are using spot instances somewhere in Cluster configuration, there is old article with old UI, but maybe it will help you to find info about if you are using spot or not:
https://www.databricks.com/blog/2016/10/25/running-apache-spark-clusters-with-spot-instances-in-data...

Of course if there is no strong dependency somewhere inside the code that one task is blocking another.

One more thing - because i am more from Azure than AWS - in Azure there is something like QUOTA on Azure Subscription that limits you how many VMs with certain size you can provision at one time. 
Maybe there is something like that in AWS that prevents you from starting more than "X" number of clusters at once. 

Just advice below:
You can also change your workflow to have one job with multiple tasks running in parallel and configure job cluster for one or for many tasks (reuse cluster), you can save  $$ because Databricks will run in parallel as many tasks as cluster can handle and you don't wait for cluster start time (you can provision bigger cluster an let it run in parallel) . 

Sorry @jainshasha  - i have no more ideas what may be the reason 😞 

emora
New Contributor III

Hello all,

Did you tried to configured the Advanced settings?

emora_0-1715068234044.png

You must configure this option to have concurrent runs for one workflow.

jainshasha
New Contributor III

Hi @emora 

Thansk for reply but this query is not regarding running concurrent run for the same workflow but rather this is regarding running different workflows concurrently or parallely.

 

@Wojciech_BUK 
I am using Google cloud for my Databricks, if you know any limitation around that for launching too many clusters at a time ?
Also as per you what should be the best way for launching clusters...obviously if i am launching 20 different clusters at a time it is taking lot of time in just launching (btw just one question if cluster launching takes lot of time does that amount of time also adding cost to me from cloud side and from databricks side ?) what is the best way...as cluster pool is not much suitable as in that also i have to keep running atleast one cluster all the time which eventually cost me more than alunching 20 clusters at a time

emora
New Contributor III

Honestly you shouldn't have any kind of limitation executing diferent workflows.

I did a test case in my Databricks and if you have your workflows with a job cluster your shouldn't have limitation. But I did all my test in Azure and just for you to know, all the resources that you need to create in your Databricks (I mean clusters) are related to a subscription in Azure in which you are creating the VM for the cluster specification. So maybe you must pay attetion to this related subscription (Don't know how Google works in this terms) to check if you have any kind of limitation creating VM for your clusters.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group