Job Cluster Reuse

SparkMan
Databricks Partner

Hi, I have a job where a job cluster is reused twice for task A and task C. Between A and C, task B runs for 4 hours on a different interactive cluster. The issue here is that the job cluster doesn't terminate as soon as Task A is completed and sits idle for 4 hrs. Is this expected behaviour when reusing a job cluster? Do we need to make sure that job cluster is reused only for consecutive tasks?

TASK A -> TASK B -> TASK C

More on job cluster reuse:

https://community.databricks.com/t5/technical-blog/maximizing-resource-utilisation-with-cluster-reu…

 

 

 

 

 

szymon_dybczak
Esteemed Contributor III

Hi @SparkMan ,

This is expected behavior with Databricks job cluster reuse unless you change your job/task configuration. Look at following documentation entry:

szymon_dybczak_0-1768565474054.png

So with your flow you have something like this:

Task A (job cluster) → Task B (interactive cluster) → Task C (job cluster)

If Task A and Task C share the same job cluster, the cluster will stay alive and idle during Task B’s execution - because Databricks doesn’t consider the job cluster “done” until all tasks that refer to it (A and C) have run. That’s why you see ~4 hours of idle time

View solution in original post