<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Switching from All-Purpose to Job Compute – How to Reuse Cluster in Parent/Child Jobs? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/switching-from-all-purpose-to-job-compute-how-to-reuse-cluster/m-p/114656#M44897</link>
    <description>&lt;P&gt;Hi, here are possible workaround options&lt;/P&gt;&lt;P&gt;Option 1: Parent Job Creates Cluster, Passes Cluster ID to Child Jobs (Workaround)&lt;BR /&gt;This is a clever workaround but requires manual management:&lt;/P&gt;&lt;P&gt;Create a job compute cluster manually via API or CLI (using Clusters API).&lt;/P&gt;&lt;P&gt;Extract the cluster_id after creation.&lt;/P&gt;&lt;P&gt;Pass the existing_cluster_id as a parameter to all child jobs using Jobs API or orchestration tools like Databricks Workflows or Airflow.&lt;/P&gt;&lt;P&gt;Ensure child jobs use existing_cluster_id.&lt;/P&gt;&lt;P&gt;Parent job terminates the cluster at the end.&lt;/P&gt;&lt;P&gt;You must manage lifecycle (start and terminate), and there's risk of cost overruns if termination fails.&lt;/P&gt;&lt;P&gt;Option 2: Convert Child Tasks into Sub-Tasks (Job Tasks API)&lt;BR /&gt;Instead of separate jobs, use one multi-task job where each child is a task.&lt;/P&gt;&lt;P&gt;All tasks can share a single job cluster defined at the job level.&lt;/P&gt;&lt;P&gt;This is the recommended and most cost-efficient way.&lt;/P&gt;&lt;P&gt;{&lt;BR /&gt;"job_clusters": [&lt;BR /&gt;{&lt;BR /&gt;"job_cluster_key": "shared_job_cluster",&lt;BR /&gt;"new_cluster": {&lt;BR /&gt;"spark_version": "13.3.x-scala2.12",&lt;BR /&gt;"node_type_id": "Standard_DS3_v2",&lt;BR /&gt;"num_workers": 2&lt;BR /&gt;}&lt;BR /&gt;}&lt;BR /&gt;],&lt;BR /&gt;"tasks": [&lt;BR /&gt;{&lt;BR /&gt;"task_key": "task1",&lt;BR /&gt;"notebook_task": { "notebook_path": "/Tasks/Task1" },&lt;BR /&gt;"job_cluster_key": "shared_job_cluster"&lt;BR /&gt;},&lt;BR /&gt;{&lt;BR /&gt;"task_key": "task2",&lt;BR /&gt;"depends_on": [ { "task_key": "task1" } ],&lt;BR /&gt;"notebook_task": { "notebook_path": "/Tasks/Task2" },&lt;BR /&gt;"job_cluster_key": "shared_job_cluster"&lt;BR /&gt;}&lt;BR /&gt;]&lt;BR /&gt;}&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;All tasks here use the same shared_job_cluster, and it's automatically terminated once the job finishes.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Sun, 06 Apr 2025 21:40:50 GMT</pubDate>
    <dc:creator>Abiola1</dc:creator>
    <dc:date>2025-04-06T21:40:50Z</dc:date>
    <item>
      <title>Switching from All-Purpose to Job Compute – How to Reuse Cluster in Parent/Child Jobs?</title>
      <link>https://community.databricks.com/t5/data-engineering/switching-from-all-purpose-to-job-compute-how-to-reuse-cluster/m-p/114645#M44894</link>
      <description>&lt;P&gt;I’m transitioning from all-purpose clusters to job compute to optimize costs. Previously, we reused an&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;existing_cluster_id&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;in the job configuration to reduce total job runtime.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;My use case:&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;A&amp;nbsp;&lt;STRONG&gt;parent job&lt;/STRONG&gt;&amp;nbsp;triggers multiple&amp;nbsp;&lt;STRONG&gt;child jobs sequentially&lt;/STRONG&gt;.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;I want to&amp;nbsp;&lt;STRONG&gt;create a job compute cluster in the parent job&lt;/STRONG&gt;&amp;nbsp;and&amp;nbsp;&lt;STRONG&gt;reuse the same cluster&lt;/STRONG&gt;&amp;nbsp;for all child jobs.&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Has anyone implemented this? Any advice on achieving this setup would be greatly appreciated!&lt;/P&gt;</description>
      <pubDate>Sun, 06 Apr 2025 18:51:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/switching-from-all-purpose-to-job-compute-how-to-reuse-cluster/m-p/114645#M44894</guid>
      <dc:creator>satyam-verma</dc:creator>
      <dc:date>2025-04-06T18:51:59Z</dc:date>
    </item>
    <item>
      <title>Re: Switching from All-Purpose to Job Compute – How to Reuse Cluster in Parent/Child Jobs?</title>
      <link>https://community.databricks.com/t5/data-engineering/switching-from-all-purpose-to-job-compute-how-to-reuse-cluster/m-p/114656#M44897</link>
      <description>&lt;P&gt;Hi, here are possible workaround options&lt;/P&gt;&lt;P&gt;Option 1: Parent Job Creates Cluster, Passes Cluster ID to Child Jobs (Workaround)&lt;BR /&gt;This is a clever workaround but requires manual management:&lt;/P&gt;&lt;P&gt;Create a job compute cluster manually via API or CLI (using Clusters API).&lt;/P&gt;&lt;P&gt;Extract the cluster_id after creation.&lt;/P&gt;&lt;P&gt;Pass the existing_cluster_id as a parameter to all child jobs using Jobs API or orchestration tools like Databricks Workflows or Airflow.&lt;/P&gt;&lt;P&gt;Ensure child jobs use existing_cluster_id.&lt;/P&gt;&lt;P&gt;Parent job terminates the cluster at the end.&lt;/P&gt;&lt;P&gt;You must manage lifecycle (start and terminate), and there's risk of cost overruns if termination fails.&lt;/P&gt;&lt;P&gt;Option 2: Convert Child Tasks into Sub-Tasks (Job Tasks API)&lt;BR /&gt;Instead of separate jobs, use one multi-task job where each child is a task.&lt;/P&gt;&lt;P&gt;All tasks can share a single job cluster defined at the job level.&lt;/P&gt;&lt;P&gt;This is the recommended and most cost-efficient way.&lt;/P&gt;&lt;P&gt;{&lt;BR /&gt;"job_clusters": [&lt;BR /&gt;{&lt;BR /&gt;"job_cluster_key": "shared_job_cluster",&lt;BR /&gt;"new_cluster": {&lt;BR /&gt;"spark_version": "13.3.x-scala2.12",&lt;BR /&gt;"node_type_id": "Standard_DS3_v2",&lt;BR /&gt;"num_workers": 2&lt;BR /&gt;}&lt;BR /&gt;}&lt;BR /&gt;],&lt;BR /&gt;"tasks": [&lt;BR /&gt;{&lt;BR /&gt;"task_key": "task1",&lt;BR /&gt;"notebook_task": { "notebook_path": "/Tasks/Task1" },&lt;BR /&gt;"job_cluster_key": "shared_job_cluster"&lt;BR /&gt;},&lt;BR /&gt;{&lt;BR /&gt;"task_key": "task2",&lt;BR /&gt;"depends_on": [ { "task_key": "task1" } ],&lt;BR /&gt;"notebook_task": { "notebook_path": "/Tasks/Task2" },&lt;BR /&gt;"job_cluster_key": "shared_job_cluster"&lt;BR /&gt;}&lt;BR /&gt;]&lt;BR /&gt;}&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;All tasks here use the same shared_job_cluster, and it's automatically terminated once the job finishes.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 06 Apr 2025 21:40:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/switching-from-all-purpose-to-job-compute-how-to-reuse-cluster/m-p/114656#M44897</guid>
      <dc:creator>Abiola1</dc:creator>
      <dc:date>2025-04-06T21:40:50Z</dc:date>
    </item>
    <item>
      <title>Re: Switching from All-Purpose to Job Compute – How to Reuse Cluster in Parent/Child Jobs?</title>
      <link>https://community.databricks.com/t5/data-engineering/switching-from-all-purpose-to-job-compute-how-to-reuse-cluster/m-p/114660#M44899</link>
      <description>&lt;P&gt;Hi&amp;nbsp;satyam-verma,&lt;/P&gt;&lt;P&gt;How are you doing today?, As per my understanding, switching from all-purpose clusters to job compute can definitely help with cost optimization. In your case, where a parent job triggers multiple child jobs, it makes sense to want to reuse the same job cluster to avoid the overhead of spinning up a new one each time. However, Databricks job clusters are ephemeral—they're created at the start of a job and shut down when the job finishes—so they can’t be reused across multiple jobs like all-purpose clusters. A common workaround is to refactor your child jobs into tasks within a single multi-task job using Databricks Workflows. This way, all the tasks can share the same job cluster defined in the parent job, and you'll still get the cost-saving benefits of job compute. If you absolutely need separate jobs, the only way to share compute is by going back to an all-purpose cluster, but that might defeat the cost benefits. Let me know if you want help setting up a multi-task workflow!&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Brahma&lt;/P&gt;</description>
      <pubDate>Mon, 07 Apr 2025 02:37:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/switching-from-all-purpose-to-job-compute-how-to-reuse-cluster/m-p/114660#M44899</guid>
      <dc:creator>Brahmareddy</dc:creator>
      <dc:date>2025-04-07T02:37:06Z</dc:date>
    </item>
  </channel>
</rss>

