cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks cluster random slow start times.

thackman
New Contributor II

We have a job that runs on single user job compute because we've had compatibility issues switching to shared compute.

Normally the cluster (1 driver,1 worker) takes five to six minutes to start. This is on Azure and we only include two small python libraries. Paying for 11 minutes of servers plus DBUs to run a 5 minute job isn't ideal but we are stuck with it until we can address some Spark Connect breaking changes.

thackman_1-1720639616797.png

Several times each week, the job compute takes 20 to 35 minutes to start. Paying for 35 minutes of compute to run a 5 minute job is hard to justify.

thackman_0-1720639478363.png

Any idea why the cluster startup time is so variable?

 

5 REPLIES 5

jacovangelder
Honored Contributor

Are you on Azure or AWS? If Azure, it might be worth checking the event logs on the Databricks managed resource group level. Usually this is not Databricks related but cloud provider related. 

Witold
Honored Contributor

30min is definitely a very long time. Despite that there are techniques to lower the spin-up time. Like using Pools or serverless clusters.

One thing I once saw with a customer is that all the traffic was routed through an IDS/Packet filter. You could tell it initially by seeing a very low network throughput.

PSR100
New Contributor III

Sometimes, there can be a delay in init script execution. But based on the screenshot, there are no init script logs and also as you have mentioned, there are only 2 libraries to be installed on the cluster. So this should not take much time to install.

Also the number of nodes are also less (1 driver and 1 worker), so this might be an issue with the Cloud Provider where the delay is happening while the VM is created and attached to the cluster. If its a VNET injected workspace and if you are using custom DNS servers, you can check the DNS logs to know if there is any delay happening.

Regarding the cost for the cluster, as per my understanding, the DBU will be calculated only after the cluster is up and running. So until the cluster is started, the cost will not be calculated.

thackman
New Contributor II

It's good to know that DBU charges don't start until the cluster is operational. But, we are paying Azure charges for the VMs and other infrastructure during that whole startup phase. 

Rishabh_Tiwari
Databricks Employee
Databricks Employee

Hi @thackman ,

Thank you for reaching out to our community! We're here to help you. 

To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedback not only helps us assist you better but also benefits other community members who may have similar questions in the future.

If you found the answer helpful, consider giving it a kudo. If the response fully addresses your question, please mark it as the accepted solution. This will help us close the thread and ensure your question is resolved.

We appreciate your participation and are here to assist you further if you need it!

Thanks,

Rishabh

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group