Databricks Community

radothede · ‎05-02-2024

I have a cluster pool with max capacity. I run multiple jobs against that cluster pool.

Can on-demand clusters, created within this cluster pool, be shared across multiple different jobs, at the same time?

The reason I'm asking is I can see a downgrade in execution time of a specific job in PRD env, where other jobs are running, using the same cluster pool.

If I run the same job in UAT, where there are no other job runs, the job is done within 40 minutes, but it takes 90-120 minutes in PRD.

All the setup is the same, autoscailing cluster, the same node type id, same data.

What could be the reason ?

radothede · ‎05-04-2024

@Retired_mod Thanks a lot for You extensive reply, that is insightful one.

Regarding the factors You mentioned above:

1. this is a single-task job, does not apply here,

2. Of course, there are a lot of jobs running using the same cluster pool. Could You please elaborate on this one? Does it mean that different jobs are capable of using the same clusters if pointing to the same cluster pool? In other words, there is possibility that 2 or more jobs are using the same job cluster, right?

3. the same setup,

4. the same,

5. the same setup, no such impact here, even if not pre-populated,

6. the same region, I guess the same setup - no impact on other jobs running on PRD.