cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks cluster random slow start times.

thackman
New Contributor

We have a job that runs on single user job compute because we've had compatibility issues switching to shared compute.

Normally the cluster (1 driver,1 worker) takes five to six minutes to start. This is on Azure and we only include two small python libraries. Paying for 11 minutes of servers plus DBUs to run a 5 minute job isn't ideal but we are stuck with it until we can address some Spark Connect breaking changes.

thackman_1-1720639616797.png

Several times each week, the job compute takes 20 to 35 minutes to start. Paying for 35 minutes of compute to run a 5 minute job is hard to justify.

thackman_0-1720639478363.png

Any idea why the cluster startup time is so variable?

 

3 REPLIES 3

jacovangelder
Contributor III

Are you on Azure or AWS? If Azure, it might be worth checking the event logs on the Databricks managed resource group level. Usually this is not Databricks related but cloud provider related. 

Witold
New Contributor III

30min is definitely a very long time. Despite that there are techniques to lower the spin-up time. Like using Pools or serverless clusters.

One thing I once saw with a customer is that all the traffic was routed through an IDS/Packet filter. You could tell it initially by seeing a very low network throughput.

PSR100
New Contributor

Sometimes, there can be a delay in init script execution. But based on the screenshot, there are no init script logs and also as you have mentioned, there are only 2 libraries to be installed on the cluster. So this should not take much time to install.

Also the number of nodes are also less (1 driver and 1 worker), so this might be an issue with the Cloud Provider where the delay is happening while the VM is created and attached to the cluster. If its a VNET injected workspace and if you are using custom DNS servers, you can check the DNS logs to know if there is any delay happening.

Regarding the cost for the cluster, as per my understanding, the DBU will be calculated only after the cluster is up and running. So until the cluster is started, the cost will not be calculated.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!