<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to reuse a cluster with Databricks Asset bundles in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-reuse-a-cluster-with-databricks-asset-bundles/m-p/62503#M31981</link>
    <description>&lt;P&gt;Hello,&lt;BR /&gt;&lt;SPAN&gt;Jobs are specific in Databricks; a job definition also contains the cluster definition because when you run a job, a new cluster is created based on the cluster specification you provided for the job, and it exists only until the job is completed. You can define a cluster on the job level or for individual tasks. You can use the same cluster within a job, so multiple tasks can be run on the same cluster.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;If you want, you can share a cluster between jobs, BUT! it will be an All Purpose Cluster that costs 2x more DBUs. It is not recommended to use all-purpose clusters for jobs unless you have very specific needs.&lt;/P&gt;&lt;P&gt;Asset bundles are not very well documented yet (in public preview), so you can always refer to the API documentation:&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/api/workspace/jobs/create" target="_new"&gt;API Documentation Link&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;In the documentation, there is an example where the script uses:&lt;/SPAN&gt;&lt;/P&gt;&lt;PRE&gt;&lt;SPAN class=""&gt;existing_cluster_id&lt;/SPAN&gt;&lt;/PRE&gt;&lt;P&gt;This ID is ID of All Purpose Cluster that you can find in JSON definition of a cluster.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Wojciech_BUK_1-1709461581081.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/6484i77E8B0E6A79DB205/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="Wojciech_BUK_1-1709461581081.png" alt="Wojciech_BUK_1-1709461581081.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;so in YAML it will be:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;  tasks:
        - task_key: notebook_task
          existing_cluster_id: Id of your existing Cluster&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;But as I mentioned, it is recommended to use Job Clusters. You can define multiple job clusters, for example, 2 clusters:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;      job_clusters:
        - job_cluster_key: &amp;lt;some-unique-programmatic-identifier-for-this-key&amp;gt;
          new_cluster:
            # Cluster settings.&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;P&gt;And use them WITHIN the job by assigning job_cluster_key to task specifications.&lt;/P&gt;&lt;P&gt;In this section of documentation you can see how you can do it:&lt;/P&gt;&lt;P&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/workflows/jobs/how-to/use-bundles-with-jobs#step-5-add-a-bundle-configuration-file-to-the-project" target="_blank" rel="noopener"&gt;https://learn.microsoft.com/en-us/azure/databricks/workflows/jobs/how-to/use-bundles-with-jobs#step-5-add-a-bundle-configuration-file-to-the-project&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Sun, 03 Mar 2024 10:39:58 GMT</pubDate>
    <dc:creator>Wojciech_BUK</dc:creator>
    <dc:date>2024-03-03T10:39:58Z</dc:date>
    <item>
      <title>How to reuse a cluster with Databricks Asset bundles</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-reuse-a-cluster-with-databricks-asset-bundles/m-p/62449#M31971</link>
      <description>&lt;P&gt;I am using Databricks asset bundles as an IAC tool with databricks. I want to create a cluster using DAB and then reuse the same cluster in multiple jobs. I can not find an example for this. Whatever examples I found out have all specified individual new clusters while defining a job. How can we reuse the clusters?&lt;/P&gt;</description>
      <pubDate>Fri, 01 Mar 2024 14:06:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-reuse-a-cluster-with-databricks-asset-bundles/m-p/62449#M31971</guid>
      <dc:creator>sumitdesai</dc:creator>
      <dc:date>2024-03-01T14:06:00Z</dc:date>
    </item>
    <item>
      <title>Re: How to reuse a cluster with Databricks Asset bundles</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-reuse-a-cluster-with-databricks-asset-bundles/m-p/62503#M31981</link>
      <description>&lt;P&gt;Hello,&lt;BR /&gt;&lt;SPAN&gt;Jobs are specific in Databricks; a job definition also contains the cluster definition because when you run a job, a new cluster is created based on the cluster specification you provided for the job, and it exists only until the job is completed. You can define a cluster on the job level or for individual tasks. You can use the same cluster within a job, so multiple tasks can be run on the same cluster.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;If you want, you can share a cluster between jobs, BUT! it will be an All Purpose Cluster that costs 2x more DBUs. It is not recommended to use all-purpose clusters for jobs unless you have very specific needs.&lt;/P&gt;&lt;P&gt;Asset bundles are not very well documented yet (in public preview), so you can always refer to the API documentation:&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/api/workspace/jobs/create" target="_new"&gt;API Documentation Link&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;In the documentation, there is an example where the script uses:&lt;/SPAN&gt;&lt;/P&gt;&lt;PRE&gt;&lt;SPAN class=""&gt;existing_cluster_id&lt;/SPAN&gt;&lt;/PRE&gt;&lt;P&gt;This ID is ID of All Purpose Cluster that you can find in JSON definition of a cluster.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Wojciech_BUK_1-1709461581081.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/6484i77E8B0E6A79DB205/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="Wojciech_BUK_1-1709461581081.png" alt="Wojciech_BUK_1-1709461581081.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;so in YAML it will be:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;  tasks:
        - task_key: notebook_task
          existing_cluster_id: Id of your existing Cluster&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;But as I mentioned, it is recommended to use Job Clusters. You can define multiple job clusters, for example, 2 clusters:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;      job_clusters:
        - job_cluster_key: &amp;lt;some-unique-programmatic-identifier-for-this-key&amp;gt;
          new_cluster:
            # Cluster settings.&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;P&gt;And use them WITHIN the job by assigning job_cluster_key to task specifications.&lt;/P&gt;&lt;P&gt;In this section of documentation you can see how you can do it:&lt;/P&gt;&lt;P&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/workflows/jobs/how-to/use-bundles-with-jobs#step-5-add-a-bundle-configuration-file-to-the-project" target="_blank" rel="noopener"&gt;https://learn.microsoft.com/en-us/azure/databricks/workflows/jobs/how-to/use-bundles-with-jobs#step-5-add-a-bundle-configuration-file-to-the-project&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Sun, 03 Mar 2024 10:39:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-reuse-a-cluster-with-databricks-asset-bundles/m-p/62503#M31981</guid>
      <dc:creator>Wojciech_BUK</dc:creator>
      <dc:date>2024-03-03T10:39:58Z</dc:date>
    </item>
    <item>
      <title>Re: How to reuse a cluster with Databricks Asset bundles</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-reuse-a-cluster-with-databricks-asset-bundles/m-p/98764#M39836</link>
      <description>&lt;P&gt;Hi, would it also be possible to reuse the same job cluster for multiple "Run Job" Tasks?&lt;/P&gt;</description>
      <pubDate>Thu, 14 Nov 2024 09:51:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-reuse-a-cluster-with-databricks-asset-bundles/m-p/98764#M39836</guid>
      <dc:creator>felix_</dc:creator>
      <dc:date>2024-11-14T09:51:53Z</dc:date>
    </item>
    <item>
      <title>Re: How to reuse a cluster with Databricks Asset bundles</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-reuse-a-cluster-with-databricks-asset-bundles/m-p/98858#M39860</link>
      <description>&lt;P&gt;I can think of a way if you are fine with running those jobs one after another. You can create a new job and add multiple tasks one corresponding to each job and chain them together. You will need to configure just one job cluster and same cluster should get reused by all tasks&lt;/P&gt;</description>
      <pubDate>Fri, 15 Nov 2024 04:23:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-reuse-a-cluster-with-databricks-asset-bundles/m-p/98858#M39860</guid>
      <dc:creator>sumitdesai</dc:creator>
      <dc:date>2024-11-15T04:23:34Z</dc:date>
    </item>
  </channel>
</rss>

