Databricks cluster random slow start times.

thackman
Databricks Partner

We have a job that runs on single user job compute because we've had compatibility issues switching to shared compute.

Normally the cluster (1 driver,1 worker) takes five to six minutes to start. This is on Azure and we only include two small python libraries. Paying for 11 minutes of servers plus DBUs to run a 5 minute job isn't ideal but we are stuck with it until we can address some Spark Connect breaking changes.

thackman_1-1720639616797.png

Several times each week, the job compute takes 20 to 35 minutes to start. Paying for 35 minutes of compute to run a 5 minute job is hard to justify.

thackman_0-1720639478363.png

Any idea why the cluster startup time is so variable?