โ04-14-2025 01:35 AM
Hey.
I am testing a continuous workflow job which executes the same notebook, so rather simple and it works well. It seems like it re-creates the job cluster for every iteration, instead of just re-using the one created at the first execution. Is that really the case? If yes, is there a setting I am overlooking or something?
Best,
Johan.
โ04-16-2025 01:26 AM
Hi,
Below is an example code snippet illustrating my current approach. I use dbutils.notebook.exit to terminate the notebook execution either when a predefined stop time is reached or after a set number of iterations in the while loop.
When dbutils.notebook.exit is triggered, the job run stops. Since the job is set on a continuous schedule, a new job run is automatically started immediately afterward.
max_job_duration = 14400 # in seconds
num_completed_run = 0
time_restart_job = datetime.now() + timedelta(seconds=max_job_duration)
while True:
time_current = datetime.now()
if time_current >= time_restart_job or num_completed_run >= num_max_run:
# Exit the loop to allow job restart
dbutils.notebook.exit(f"Exited notebook at {time_current}.")
num_completed_run += 1
โ04-14-2025 09:26 PM
Hi jar,
How are you doing today?, as per my understanding, You're absolutely right in your observationโDatabricks will create a new job cluster for each run of the job, even in a continuous workflow, unless youโre using an all-purpose cluster (which isn't ideal for cost or isolation in production). Job clusters are ephemeral by design, meaning they spin up for the run and shut down once it's done, to ensure a clean environment each time. Right now, thereโs no built-in setting to keep the same job cluster alive across multiple runs in a looped workflow. If you want to truly reuse a cluster across iterations, you'd need to point your job to an existing all-purpose cluster manuallyโbut that does trade off isolation and can increase risk of leftover state between runs. For most use cases, letting the job cluster restart each time is safer, even if it adds some overhead. Let me know if you want to explore workflow alternatives to help minimize startup time!
Regards,
Brahma
โ04-15-2025 03:26 AM
Hi,
@Brahmareddy is right โ Iโve encountered the same issue. Even when using a continuous job, I still experience the overhead of compute restarting after each run completes.
As a temporary workaround (until the more cost-effective serverless update is available), Iโve created a main notebook that uses dbutils.notebook.run inside a while loop to handle orchestration. This loop runs continuously but breaks every few hours to force a compute restart. Because it's a single-task notebook set up as a continuous job, it immediately kicks off a new run after exiting.
Iโve also experimented with compute pools, but they seem to introduce a similar level of overhead.
This setup is far from ideal, but it works for now as we await future improvements from Databricks.
โ04-15-2025 03:43 AM
use dbutils.notebook.run inside a while loop to handle orchestration
โ04-15-2025 10:41 PM
Thank you all for your answers!
I did use dbutils.notebook.run() inside a while-loop at first but ultimately would run into OOM errors, even if I tried writing in a clearing of cache after each iteration. I'm curious @RefactorDuncan, if you don't mind explaining, how did you break and restart?
โ04-16-2025 01:26 AM
Hi,
Below is an example code snippet illustrating my current approach. I use dbutils.notebook.exit to terminate the notebook execution either when a predefined stop time is reached or after a set number of iterations in the while loop.
When dbutils.notebook.exit is triggered, the job run stops. Since the job is set on a continuous schedule, a new job run is automatically started immediately afterward.
max_job_duration = 14400 # in seconds
num_completed_run = 0
time_restart_job = datetime.now() + timedelta(seconds=max_job_duration)
while True:
time_current = datetime.now()
if time_current >= time_restart_job or num_completed_run >= num_max_run:
# Exit the loop to allow job restart
dbutils.notebook.exit(f"Exited notebook at {time_current}.")
num_completed_run += 1
โ04-22-2025 02:11 AM
Clever. Thank you for sharing!
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now