cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Job compute is taking longer even after using pool

bhargavabasava
New Contributor II

Hi team,

We created a workflow and attached it to a job cluster (which is configured to use compute pool). When we run the pipeline, it takes up to 5 minutes to go into clusterReady state and this is adding latency to our use case. Even with subsequent runs, it's waiting for cluster to be ready. Can someone please help me understand how to reduce the overall latency and better way of using job compute.

We tried with serverless warehouse (non SQL) and it's adding around 20-25 seconds latency for each task in the job. In screenshot (Screenshot 2025-04-03 ar 3:43:23 PM), the task took 33 seconds but notebook cell has run only for 16 seconds. Would like to understand what is adding up to latency in this case.

Thanks & Regards,

Bhargava

3 REPLIES 3

Isi
Contributor III

Hey @bhargavabasava ,

Job Cluster + Compute Pools: Long Startup Times

If you’re using Job Clusters backed by compute pools, the initial delay (~5 minutes) is usually due to cluster provisioning. While compute pools are designed to reduce cold start times by pre-warming VMs, startup latency can still occur if:

  • There are no idle VMs available in the pool (e.g., 0 clusters in idle state).

  • The cluster needs to install libraries or run init scripts, which adds to the boot time.

Serverless Jobs Latency (~20–25 seconds overhead)

The behavior you’re seeing where the notebook logic takes 16 seconds but the task duration is 33 seconds is expected when using Serverless compute for Jobs (non-SQL). There is a small but consistent overhead due to orchestration, environment setup, and logging.

That said, serverless jobs generally start much faster than job clusters and offer more predictable latency, so a 20–25 second overhead is considered normal.

Suggestions to Reduce Latency

  • Use instance pools with Idle Instance Auto Termination set to ~10 minutes. This allows reusing VMs across runs without incurring full provisioning times.

  • If you’re using isolated job clusters, try to chain multiple tasks in a single job using dependencies. This way, only the first task pays the cold-start penalty, and the following tasks run on the same cluster.


Hope this helps 🙂

Isi

bhargavabasava
New Contributor II

Hey @Isi ,

Yeah this helps. Thanks a lot.

Hey @bhargavabasava ,

Happy to hear that! Consider mark my answer as solution to future users 🙂

Thanks,
Isi

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now