<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Run failed with error message  Cluster was terminated. Reason: JOB_FINISHED (SUCCESS) in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/run-failed-with-error-message-cluster-was-terminated-reason-job/m-p/144837#M52394</link>
    <description>&lt;P&gt;I am running a notebook through workflow using all purpose cluster("data_security_mode": "USER_ISOLATION"). I am seeing some strange behaviour with the cluster during the run. While the job is still running cluster gets terminated with the Reason: Reason: JOB_FINISHED (SUCCESS). This causes the running job to fail with error, cluster was terminated. I am not able to find any details in the cluster event log or driver log.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 22 Jan 2026 06:36:17 GMT</pubDate>
    <dc:creator>holychs</dc:creator>
    <dc:date>2026-01-22T06:36:17Z</dc:date>
    <item>
      <title>Run failed with error message  Cluster was terminated. Reason: JOB_FINISHED (SUCCESS)</title>
      <link>https://community.databricks.com/t5/data-engineering/run-failed-with-error-message-cluster-was-terminated-reason-job/m-p/144837#M52394</link>
      <description>&lt;P&gt;I am running a notebook through workflow using all purpose cluster("data_security_mode": "USER_ISOLATION"). I am seeing some strange behaviour with the cluster during the run. While the job is still running cluster gets terminated with the Reason: Reason: JOB_FINISHED (SUCCESS). This causes the running job to fail with error, cluster was terminated. I am not able to find any details in the cluster event log or driver log.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 22 Jan 2026 06:36:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/run-failed-with-error-message-cluster-was-terminated-reason-job/m-p/144837#M52394</guid>
      <dc:creator>holychs</dc:creator>
      <dc:date>2026-01-22T06:36:17Z</dc:date>
    </item>
    <item>
      <title>Re: Run failed with error message  Cluster was terminated. Reason: JOB_FINISHED (SUCCESS)</title>
      <link>https://community.databricks.com/t5/data-engineering/run-failed-with-error-message-cluster-was-terminated-reason-job/m-p/144858#M52397</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/135150"&gt;@holychs&lt;/a&gt;&amp;nbsp;- Well, this behaviour needs troubleshooting I imagine.&lt;/P&gt;&lt;P&gt;- What is the auto-termination value. Try increasing it to much higher value and observe if it is the same.&lt;/P&gt;&lt;P&gt;- Does your workflow have multiple notebook tasks? If Task A finishes while Task B is still running, a glitch in the job contxt can sometime trigger a cluster teardown if the cluster was pinned to the job.&lt;/P&gt;&lt;P&gt;- Does your notebook contains conditional logic that calls dbutils.notebook.exit("Success") ?&lt;/P&gt;&lt;P&gt;- Are you triggering this job manually while someone else is using the cluster?&lt;/P&gt;&lt;P&gt;Also, check the Runs on the cluster. Go to the &lt;STRONG&gt;Compute&lt;/STRONG&gt; page -&amp;gt; Select your Cluster -&amp;gt; &lt;STRONG&gt;Runs&lt;/STRONG&gt; tab. This will show you exactly which jobs/notebooks were attached to that cluster at the moment of termination.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 22 Jan 2026 09:23:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/run-failed-with-error-message-cluster-was-terminated-reason-job/m-p/144858#M52397</guid>
      <dc:creator>Raman_Unifeye</dc:creator>
      <dc:date>2026-01-22T09:23:09Z</dc:date>
    </item>
    <item>
      <title>Re: Run failed with error message  Cluster was terminated. Reason: JOB_FINISHED (SUCCESS)</title>
      <link>https://community.databricks.com/t5/data-engineering/run-failed-with-error-message-cluster-was-terminated-reason-job/m-p/153732#M54004</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Hi — the &lt;/SPAN&gt;&lt;SPAN&gt;JOB_FINISHED (SUCCESS)&lt;/SPAN&gt;&lt;SPAN&gt; termination reason is the key clue here. It means &lt;/SPAN&gt;&lt;STRONG&gt;another job that was using the same all-purpose cluster finished&lt;/STRONG&gt;&lt;SPAN&gt;, and its completion triggered the cluster termination — taking your still-running job down with it.&lt;/SPAN&gt;&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;Most Likely Cause&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;&lt;SPAN&gt;When multiple workflows share the same all-purpose cluster via &lt;/SPAN&gt;&lt;SPAN&gt;existing_cluster_id&lt;/SPAN&gt;&lt;SPAN&gt;, any one of those jobs finishing can trigger the cluster lifecycle to mark it as "job finished." If the cluster's context gets tied to the completing job, it terminates even though your job is still active. This is a known pitfall of running workflows on shared all-purpose clusters.&lt;/SPAN&gt;&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;Troubleshooting Steps&lt;/STRONG&gt;&lt;/H3&gt;
&lt;OL&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;STRONG&gt;Check what else was running on the cluster&lt;/STRONG&gt;&lt;SPAN&gt; — Go to Compute → select your cluster → &lt;/SPAN&gt;&lt;STRONG&gt;Runs&lt;/STRONG&gt;&lt;SPAN&gt; tab. Look for any other job/notebook that completed around the exact time your cluster was terminated. That's likely the culprit.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;STRONG&gt;Check cluster event log timing&lt;/STRONG&gt;&lt;SPAN&gt; — In the cluster's Event Log, correlate the termination event timestamp with any other job completions. Even if details are sparse, the timestamp match will confirm the root cause.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;STRONG&gt;Check for `dbutils.notebook.exit()`&lt;/STRONG&gt;&lt;SPAN&gt; — If your notebook (or any notebook in the workflow) calls &lt;/SPAN&gt;&lt;SPAN&gt;dbutils.notebook.exit("Success")&lt;/SPAN&gt;&lt;SPAN&gt; in conditional logic, it can signal job completion prematurely.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;STRONG&gt;Check auto-termination settings&lt;/STRONG&gt;&lt;SPAN&gt; — If set too aggressively, the cluster may interpret a brief idle gap between tasks as inactivity. Look at Compute → Edit → Auto Termination value.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;H3&gt;&lt;STRONG&gt;Recommended Fix&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;&lt;STRONG&gt;Switch to a job cluster (strongest fix):&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Instead of pointing your workflow at an all-purpose cluster, configure the workflow to use a &lt;/SPAN&gt;&lt;STRONG&gt;job cluster&lt;/STRONG&gt;&lt;SPAN&gt;. Each workflow run gets its own dedicated cluster that only terminates when &lt;/SPAN&gt;&lt;I&gt;&lt;SPAN&gt;that specific workflow&lt;/SPAN&gt;&lt;/I&gt;&lt;SPAN&gt; finishes. This completely eliminates the shared-cluster race condition.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;In your workflow JSON config, replace:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;"existing_cluster_id": "xxxx-xxxxxx-xxxxxxxx"&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;with:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;"job_clusters": [{&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;"job_cluster_key": "my_job_cluster",&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;"new_cluster": {&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"spark_version": "15.4.x-scala2.12",&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"node_type_id": "your_instance_type",&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"num_workers": 2,&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"data_security_mode": "USER_ISOLATION"&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;}&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;}]&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Or in the UI: Edit workflow → Task → Cluster dropdown → select &lt;/SPAN&gt;&lt;STRONG&gt;"New job cluster"&lt;/STRONG&gt;&lt;SPAN&gt; instead of an existing all-purpose cluster.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Other alternatives:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;STRONG&gt;Serverless compute&lt;/STRONG&gt;&lt;SPAN&gt; — no cluster management at all, fully isolated per job&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;STRONG&gt;Dedicated all-purpose cluster&lt;/STRONG&gt;&lt;SPAN&gt; — if you must use all-purpose, ensure no other jobs/workflows are configured to use the same cluster&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3&gt;&lt;STRONG&gt;Why All-Purpose Clusters Are Risky for Workflows&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;&lt;SPAN&gt;All-purpose clusters are designed for interactive, multi-user use. When workflows attach to them, the cluster lifecycle becomes unpredictable because multiple consumers (notebooks, workflows, SQL queries) compete for the same cluster context. Job clusters exist specifically to solve this — they provide 1:1 isolation between a workflow run and its compute.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Docs:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;A href="https://docs.databricks.com/aws/en/jobs/compute" target="_blank"&gt;&lt;SPAN&gt;Configure Compute for Jobs&lt;/SPAN&gt;&lt;/A&gt;&lt;/LI&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;A href="https://docs.databricks.com/aws/en/compute/troubleshooting/cluster-error-codes" target="_blank"&gt;&lt;SPAN&gt;Cluster Termination Reasons&lt;/SPAN&gt;&lt;/A&gt;&lt;/LI&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;A href="https://docs.databricks.com/aws/en/jobs/repair-job-failures" target="_blank"&gt;&lt;SPAN&gt;Troubleshoot and Repair Job Failures&lt;/SPAN&gt;&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;SPAN&gt;Hope this helps track it down!&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Apr 2026 11:17:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/run-failed-with-error-message-cluster-was-terminated-reason-job/m-p/153732#M54004</guid>
      <dc:creator>anuj_lathi</dc:creator>
      <dc:date>2026-04-08T11:17:00Z</dc:date>
    </item>
  </channel>
</rss>

