Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Showing results for 
Search instead for 
Did you mean: 

Can on-demand clusters be shared across multiple jobs using cluster pool with max capacity ?

New Contributor III

I have a cluster pool with max capacity. I run multiple jobs against that cluster pool.

Can on-demand clusters, created within this cluster pool, be shared across multiple different jobs, at the same time?

The reason I'm asking is I can see a downgrade in execution time of a specific job in PRD env, where other jobs are running, using the same cluster pool.

If I run the same job in UAT, where there are no other job runs, the job is done within 40 minutes, but it takes 90-120 minutes in PRD.

All the setup is the same, autoscailing cluster, the same node type id, same data.

What could be the reason ?



Community Manager
Community Manager

Hi @radothede,

Cluster Pools and On-Demand Clusters: In Azure Databricks, a cluster pool is a collection of idle, pre-configured clusters that can be shared among multiple users or jobs. Instead of giving each user their own dedicated cluster, you create a pool of clusters that can be u...1. These clusters can be either on-demand or preemptible.

  • On-demand clusters are created dynamically when needed and terminated after use. They are suitable for jobs with short execution times and strict execution time requirements.
  • Preemptible clusters are short-lived and can be interrupted if resources are needed elsewhere. They are cost-effective but not suitable for long-running jobs.

Cluster Reuse and Resource Utilization: Now, let’s address your specific scenario. You mentioned that you’re experiencing longer execution times for a specific job in the PRD environment compared to UAT, even though the setup (autoscaling cluster, node type, and data) is the same.

Here are a few factors to consider:

  1. Cluster Reuse: By enabling cluster reuse, a single cluster can be shared across multiple tasks within the same job. This approach provides better resource utilization and minimizes the time taken for tasks to start. Essentially, you reduce the overhead of creating and terminating clusters within a job2. If your PRD environment is not reusing clusters effectively, it could lead to longer execution times.

  2. Resource Contention: When multiple jobs share the same cluster pool, there might be resource contention. If one job is resource-intensive (e.g., high CPU or memory usage), it can impact other jobs running on the same cluster. Check if any other jobs are running concurrently during the execution of your specific job in PRD.

  3. Scaling Behavior: Autoscaling clusters adjust their size based on workload demands. However, the scaling behaviour might differ between UAT and PRD. Ensure that the autoscaling configuration is consistent across both environments.

  4. Data Volume and Distribution: Verify if the data volume processed by the specific job differs between UAT and PRD. Larger datasets can affect execution time. Additionally, data distribution across partitions can impact performance.

  5. Cluster Tags and Billing: Use pool tags and cluster tags to manage billing. Pre-populate pools to ensure instances are available when clusters are needed. Incorrect tagging or insufficient pre-population could lead to delays.

  6. Network Latency: Consider network latency between the cluster and data sources (e.g., databases, storage). PRD might have different network conditions than UAT.

Next Steps:

  1. Check the cluster utilization and resource allocation during job execution in PRD. Look for signs of resource contention.
  2. Monitor the cluster’s scaling behavior and verify that it aligns with your expectations.
  3. Review the job logs for any specific issues related to data processing or resource bottlenecks.

@Kaniz_Fatma Thanks a lot for You extensive reply, that is insightful one.

Regarding the factors You mentioned above:

1. this is a single-task job, does not apply here,

2. Of course, there are a lot of jobs running using the same cluster pool. Could You please elaborate on this one? Does it mean that different jobs are capable of using the same clusters if pointing to the same cluster pool? In other words, there is possibility that 2 or more jobs are using the same job cluster, right?

3. the same setup,

4. the same,

5. the same setup, no such impact here, even if not pre-populated,

6. the same region, I guess the same setup - no impact on other jobs running on PRD.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!