<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Continuous workflow job creating new job clusters? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/continuous-workflow-job-creating-new-job-clusters/m-p/115610#M45124</link>
    <description>&lt;P&gt;Thank you all for your answers!&lt;/P&gt;&lt;P&gt;I did use dbutils.notebook.run() inside a while-loop at first but ultimately would run into OOM errors, even if I tried writing in a clearing of cache after each iteration. I'm curious&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/159253"&gt;@RefactorDuncan&lt;/a&gt;, if you don't mind explaining, how did you break and restart?&lt;/P&gt;</description>
    <pubDate>Wed, 16 Apr 2025 05:41:35 GMT</pubDate>
    <dc:creator>jar</dc:creator>
    <dc:date>2025-04-16T05:41:35Z</dc:date>
    <item>
      <title>Continuous workflow job creating new job clusters?</title>
      <link>https://community.databricks.com/t5/data-engineering/continuous-workflow-job-creating-new-job-clusters/m-p/115391#M45078</link>
      <description>&lt;P&gt;Hey.&lt;/P&gt;&lt;P&gt;I am testing a continuous workflow job which executes the same notebook, so rather simple and it works well. It seems like it re-creates the job cluster for every iteration, instead of just re-using the one created at the first execution. Is that really the case? If yes, is there a setting I am overlooking or something?&lt;/P&gt;&lt;P&gt;Best,&lt;/P&gt;&lt;P&gt;Johan.&lt;/P&gt;</description>
      <pubDate>Mon, 14 Apr 2025 08:35:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/continuous-workflow-job-creating-new-job-clusters/m-p/115391#M45078</guid>
      <dc:creator>jar</dc:creator>
      <dc:date>2025-04-14T08:35:14Z</dc:date>
    </item>
    <item>
      <title>Re: Continuous workflow job creating new job clusters?</title>
      <link>https://community.databricks.com/t5/data-engineering/continuous-workflow-job-creating-new-job-clusters/m-p/115456#M45089</link>
      <description>&lt;P&gt;Hi&amp;nbsp;jar,&lt;/P&gt;&lt;P&gt;How are you doing today?, as per my understanding, You're absolutely right in your observation—Databricks will create a new job cluster for each run of the job, even in a continuous workflow, unless you’re using an all-purpose cluster (which isn't ideal for cost or isolation in production). Job clusters are ephemeral by design, meaning they spin up for the run and shut down once it's done, to ensure a clean environment each time. Right now, there’s no built-in setting to keep the same job cluster alive across multiple runs in a looped workflow. If you want to truly reuse a cluster across iterations, you'd need to point your job to an existing all-purpose cluster manually—but that does trade off isolation and can increase risk of leftover state between runs. For most use cases, letting the job cluster restart each time is safer, even if it adds some overhead. Let me know if you want to explore workflow alternatives to help minimize startup time!&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Brahma&lt;/P&gt;</description>
      <pubDate>Tue, 15 Apr 2025 04:26:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/continuous-workflow-job-creating-new-job-clusters/m-p/115456#M45089</guid>
      <dc:creator>Brahmareddy</dc:creator>
      <dc:date>2025-04-15T04:26:49Z</dc:date>
    </item>
    <item>
      <title>Re: Continuous workflow job creating new job clusters?</title>
      <link>https://community.databricks.com/t5/data-engineering/continuous-workflow-job-creating-new-job-clusters/m-p/115494#M45096</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/102548"&gt;@Brahmareddy&lt;/a&gt;&amp;nbsp;is right — I’ve encountered the same issue. Even when using a continuous job, I still experience the overhead of compute restarting after each run completes.&lt;/P&gt;&lt;P&gt;As a temporary workaround (until the more cost-effective serverless update is available), I’ve created a main notebook that uses &lt;STRONG&gt;dbutils.notebook.run&lt;/STRONG&gt; inside a while loop to handle orchestration. This loop runs continuously but breaks every few hours to force a compute restart. Because it's a single-task notebook set up as a continuous job, it immediately kicks off a new run after exiting.&lt;/P&gt;&lt;P&gt;I’ve also experimented with compute pools, but they seem to introduce a similar level of overhead.&lt;/P&gt;&lt;P&gt;This setup is far from ideal, but it works for now as we await future improvements from Databricks.&lt;/P&gt;</description>
      <pubDate>Tue, 15 Apr 2025 10:26:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/continuous-workflow-job-creating-new-job-clusters/m-p/115494#M45096</guid>
      <dc:creator>RefactorDuncan</dc:creator>
      <dc:date>2025-04-15T10:26:24Z</dc:date>
    </item>
    <item>
      <title>Re: Continuous workflow job creating new job clusters?</title>
      <link>https://community.databricks.com/t5/data-engineering/continuous-workflow-job-creating-new-job-clusters/m-p/115499#M45100</link>
      <description>&lt;P&gt;&lt;SPAN&gt;use&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;dbutils.notebook.run&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;inside a while loop to handle orchestration&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 15 Apr 2025 10:43:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/continuous-workflow-job-creating-new-job-clusters/m-p/115499#M45100</guid>
      <dc:creator>Aviral-Bhardwaj</dc:creator>
      <dc:date>2025-04-15T10:43:16Z</dc:date>
    </item>
    <item>
      <title>Re: Continuous workflow job creating new job clusters?</title>
      <link>https://community.databricks.com/t5/data-engineering/continuous-workflow-job-creating-new-job-clusters/m-p/115610#M45124</link>
      <description>&lt;P&gt;Thank you all for your answers!&lt;/P&gt;&lt;P&gt;I did use dbutils.notebook.run() inside a while-loop at first but ultimately would run into OOM errors, even if I tried writing in a clearing of cache after each iteration. I'm curious&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/159253"&gt;@RefactorDuncan&lt;/a&gt;, if you don't mind explaining, how did you break and restart?&lt;/P&gt;</description>
      <pubDate>Wed, 16 Apr 2025 05:41:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/continuous-workflow-job-creating-new-job-clusters/m-p/115610#M45124</guid>
      <dc:creator>jar</dc:creator>
      <dc:date>2025-04-16T05:41:35Z</dc:date>
    </item>
    <item>
      <title>Re: Continuous workflow job creating new job clusters?</title>
      <link>https://community.databricks.com/t5/data-engineering/continuous-workflow-job-creating-new-job-clusters/m-p/115615#M45128</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P class=""&gt;Below is an example code snippet illustrating my current approach. I use &lt;STRONG&gt;dbutils.notebook.exit&lt;/STRONG&gt; to terminate the notebook execution either when a predefined stop time is reached or after a set number of iterations in the &lt;STRONG&gt;while&lt;/STRONG&gt; loop.&lt;/P&gt;&lt;P class=""&gt;When&lt;STRONG&gt; dbutils.notebook.exit&lt;/STRONG&gt; is triggered, the job run stops. Since the job is set on a &lt;STRONG&gt;continuous schedule&lt;/STRONG&gt;, a new job run is automatically started immediately afterward.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;max_job_duration = 14400 # in seconds
num_completed_run = 0
time_restart_job =  datetime.now() + timedelta(seconds=max_job_duration)

while True:
   time_current = datetime.now()
   if time_current &amp;gt;= time_restart_job or num_completed_run &amp;gt;= num_max_run:
      # Exit the loop to allow job restart
      dbutils.notebook.exit(f"Exited notebook at {time_current}.")

   num_completed_run += 1&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 16 Apr 2025 08:26:28 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/continuous-workflow-job-creating-new-job-clusters/m-p/115615#M45128</guid>
      <dc:creator>RefactorDuncan</dc:creator>
      <dc:date>2025-04-16T08:26:28Z</dc:date>
    </item>
    <item>
      <title>Re: Continuous workflow job creating new job clusters?</title>
      <link>https://community.databricks.com/t5/data-engineering/continuous-workflow-job-creating-new-job-clusters/m-p/116162#M45255</link>
      <description>&lt;P&gt;Clever. Thank you for sharing!&lt;/P&gt;</description>
      <pubDate>Tue, 22 Apr 2025 09:11:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/continuous-workflow-job-creating-new-job-clusters/m-p/116162#M45255</guid>
      <dc:creator>jar</dc:creator>
      <dc:date>2025-04-22T09:11:06Z</dc:date>
    </item>
  </channel>
</rss>

