topic Re: Job compute is taking longer even after using pool in Data Engineering

Job compute is taking longer even after using pool

bhargavabasava — Thu, 03 Apr 2025 10:19:35 GMT

Hi team,

We created a workflow and attached it to a job cluster (which is configured to use compute pool). When we run the pipeline, it takes up to 5 minutes to go into clusterReady state and this is adding latency to our use case. Even with subsequent runs, it's waiting for cluster to be ready. Can someone please help me understand how to reduce the overall latency and better way of using job compute.

We tried with serverless warehouse (non SQL) and it's adding around 20-25 seconds latency for each task in the job. In screenshot (Screenshot 2025-04-03 ar 3:43:23 PM), the task took 33 seconds but notebook cell has run only for 16 seconds. Would like to understand what is adding up to latency in this case.

Thanks & Regards,

Bhargava

Re: Job compute is taking longer even after using pool

Isi — Thu, 03 Apr 2025 14:22:37 GMT

Hey @bhargavabasava ,

Job Cluster + Compute Pools: Long Startup Times

If you’re using Job Clusters backed by compute pools, the initial delay (~5 minutes) is usually due to cluster provisioning. While compute pools are designed to reduce cold start times by pre-warming VMs, startup latency can still occur if:

There are no idle VMs available in the pool (e.g., 0 clusters in idle state).
The cluster needs to install libraries or run init scripts, which adds to the boot time.

Serverless Jobs Latency (~20–25 seconds overhead)

The behavior you’re seeing where the notebook logic takes 16 seconds but the task duration is 33 seconds is expected when using Serverless compute for Jobs (non-SQL). There is a small but consistent overhead due to orchestration, environment setup, and logging.

That said, serverless jobs generally start much faster than job clusters and offer more predictable latency, so a 20–25 second overhead is considered normal.

Suggestions to Reduce Latency

Use instance pools with Idle Instance Auto Termination set to ~10 minutes. This allows reusing VMs across runs without incurring full provisioning times.
If you’re using isolated job clusters, try to chain multiple tasks in a single job using dependencies. This way, only the first task pays the cold-start penalty, and the following tasks run on the same cluster.

Hope this helps 🙂

Isi

Re: Job compute is taking longer even after using pool

bhargavabasava — Fri, 04 Apr 2025 16:03:35 GMT

Hey @Isi ,

Yeah this helps. Thanks a lot.

Re: Job compute is taking longer even after using pool

Isi — Fri, 04 Apr 2025 16:30:10 GMT

Hey @bhargavabasava ,

Happy to hear that! Consider mark my answer as solution to future users 🙂

Thanks,
Isi