cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Job Cluster Reuse

SparkMan
New Contributor II

Hi, I have a job where a job cluster is reused twice for task A and task C. Between A and C, task B runs for 4 hours on a different interactive cluster. The issue here is that the job cluster doesn't terminate as soon as Task A is completed and sits idle for 4 hrs. Is this expected behaviour when reusing a job cluster? Do we need to make sure that job cluster is reused only for consecutive tasks?

TASK A -> TASK B -> TASK C

More on job cluster reuse:

https://community.databricks.com/t5/technical-blog/maximizing-resource-utilisation-with-cluster-reu…

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

szymon_dybczak
Esteemed Contributor III

Hi @SparkMan ,

This is expected behavior with Databricks job cluster reuse unless you change your job/task configuration. Look at following documentation entry:

szymon_dybczak_0-1768565474054.png

So with your flow you have something like this:

Task A (job cluster) → Task B (interactive cluster) → Task C (job cluster)

If Task A and Task C share the same job cluster, the cluster will stay alive and idle during Task B’s execution - because Databricks doesn’t consider the job cluster “done” until all tasks that refer to it (A and C) have run. That’s why you see ~4 hours of idle time

View solution in original post

2 REPLIES 2

szymon_dybczak
Esteemed Contributor III

Hi @SparkMan ,

This is expected behavior with Databricks job cluster reuse unless you change your job/task configuration. Look at following documentation entry:

szymon_dybczak_0-1768565474054.png

So with your flow you have something like this:

Task A (job cluster) → Task B (interactive cluster) → Task C (job cluster)

If Task A and Task C share the same job cluster, the cluster will stay alive and idle during Task B’s execution - because Databricks doesn’t consider the job cluster “done” until all tasks that refer to it (A and C) have run. That’s why you see ~4 hours of idle time