<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Variable Compute clusters within a Job in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/variable-compute-clusters-within-a-job/m-p/124925#M47292</link>
    <description>&lt;P&gt;We have 3 possible compute clusters that we can run a notebook against.&lt;BR /&gt;They are varying sizes and the one that the notebook uses will depend on the size of the data being processed.&lt;/P&gt;&lt;P&gt;We "t-shirt size" each tenant base on their data size (S, M, L) and can read this config in from Postgres in a notebook.&lt;BR /&gt;Once we know the t-shirt size, is there a way of setting the compute cluster dynamically in subsequent tasks?&amp;nbsp;&lt;BR /&gt;e.g. a tenant is size M so the rest of the tasks in the job run on the M cluster&lt;/P&gt;&lt;P&gt;We'd like to avoid duplicating jobs/tasks!&lt;BR /&gt;Thanks in advance&lt;/P&gt;</description>
    <pubDate>Fri, 11 Jul 2025 14:48:03 GMT</pubDate>
    <dc:creator>allyallen</dc:creator>
    <dc:date>2025-07-11T14:48:03Z</dc:date>
    <item>
      <title>Variable Compute clusters within a Job</title>
      <link>https://community.databricks.com/t5/data-engineering/variable-compute-clusters-within-a-job/m-p/124925#M47292</link>
      <description>&lt;P&gt;We have 3 possible compute clusters that we can run a notebook against.&lt;BR /&gt;They are varying sizes and the one that the notebook uses will depend on the size of the data being processed.&lt;/P&gt;&lt;P&gt;We "t-shirt size" each tenant base on their data size (S, M, L) and can read this config in from Postgres in a notebook.&lt;BR /&gt;Once we know the t-shirt size, is there a way of setting the compute cluster dynamically in subsequent tasks?&amp;nbsp;&lt;BR /&gt;e.g. a tenant is size M so the rest of the tasks in the job run on the M cluster&lt;/P&gt;&lt;P&gt;We'd like to avoid duplicating jobs/tasks!&lt;BR /&gt;Thanks in advance&lt;/P&gt;</description>
      <pubDate>Fri, 11 Jul 2025 14:48:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/variable-compute-clusters-within-a-job/m-p/124925#M47292</guid>
      <dc:creator>allyallen</dc:creator>
      <dc:date>2025-07-11T14:48:03Z</dc:date>
    </item>
    <item>
      <title>Re: Variable Compute clusters within a Job</title>
      <link>https://community.databricks.com/t5/data-engineering/variable-compute-clusters-within-a-job/m-p/124976#M47305</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/174732"&gt;@allyallen&lt;/a&gt;, just to clarify your use case to see if I can provide a solution:&lt;/P&gt;&lt;P&gt;Are you saying you have a single job with multiple tasks, and each of those tasks runs the same notebook (e.g., notebook_1), but you'd like the compute cluster to vary depending on the tenant's t-shirt size (S, M, L) determined within the notebook and a task?&lt;/P&gt;&lt;P&gt;Or is it more that you have a parent job (e.g., job_1) which dynamically triggers other jobs or notebooks, and you'd like each of those to run on the appropriate cluster based on the tenant’s size?&lt;/P&gt;</description>
      <pubDate>Fri, 11 Jul 2025 20:11:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/variable-compute-clusters-within-a-job/m-p/124976#M47305</guid>
      <dc:creator>eniwoke</dc:creator>
      <dc:date>2025-07-11T20:11:52Z</dc:date>
    </item>
    <item>
      <title>Re: Variable Compute clusters within a Job</title>
      <link>https://community.databricks.com/t5/data-engineering/variable-compute-clusters-within-a-job/m-p/125241#M47383</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/39807"&gt;@eniwoke&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for replying!&lt;BR /&gt;I have one job that has a string of other jobs and notebooks as tasks.&amp;nbsp; This job is designed to be run against different tenants as a way of ingesting data.&lt;/P&gt;&lt;P&gt;NB1 at the beginning of the job determines the t-shirt size for the tenant and if it's S, all subsequent tasks and jobs need to run on the S cluster.&amp;nbsp; If NB1 finds the t-shirt size is M, all following tasks and jobs will run on the M cluster.&lt;BR /&gt;At the moment, I can only set one cluster per task and can't see a way of dynamically setting the cluster to use based on the output of a previous task.&lt;BR /&gt;Hope this clarifies the ask a little bit!&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Tue, 15 Jul 2025 07:13:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/variable-compute-clusters-within-a-job/m-p/125241#M47383</guid>
      <dc:creator>allyallen</dc:creator>
      <dc:date>2025-07-15T07:13:38Z</dc:date>
    </item>
    <item>
      <title>Re: Variable Compute clusters within a Job</title>
      <link>https://community.databricks.com/t5/data-engineering/variable-compute-clusters-within-a-job/m-p/125338#M47425</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/174732"&gt;@allyallen&lt;/a&gt;, thanks for the explanation. Yes, you are right; there is no direct way to change the cluster for a task while within the same job. However, you can still achieve a somewhat similar result by making a few tweaks.&lt;/P&gt;&lt;P&gt;You can start by separating the job into separate jobs, say job_1 and job_2. The task that runs NB1 will be in job_1, and then the other tasks can be in job_2.&lt;/P&gt;&lt;P&gt;Since you already know the&lt;STRONG&gt; job name/id&lt;/STRONG&gt; for job_2, you can use the &lt;A href="https://docs.databricks.com/api/workspace/jobs/update#new_settings" target="_self"&gt;update job settings&lt;/A&gt; to update the cluster for the job. Of course, the downside to this is that you'll need to know the &lt;STRONG&gt;job_id&lt;/STRONG&gt; beforehand to and you'd be using either NB1 to update job_2's cluster. That's one approach.&lt;/P&gt;&lt;P&gt;Another approach is that you can create &lt;STRONG&gt;job_2&lt;/STRONG&gt; programmatically in NB1 every time the t-shirt size changes&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="eniwoke_0-1752596597545.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/18203iF7F607271F20413D/image-size/medium?v=v2&amp;amp;px=400" role="button" title="eniwoke_0-1752596597545.png" alt="eniwoke_0-1752596597545.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Let me know if it helps &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 15 Jul 2025 17:14:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/variable-compute-clusters-within-a-job/m-p/125338#M47425</guid>
      <dc:creator>eniwoke</dc:creator>
      <dc:date>2025-07-15T17:14:02Z</dc:date>
    </item>
    <item>
      <title>Re: Variable Compute clusters within a Job</title>
      <link>https://community.databricks.com/t5/data-engineering/variable-compute-clusters-within-a-job/m-p/125447#M47447</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/39807"&gt;@eniwoke&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;That's a great solution thank you so much!&lt;BR /&gt;Our process is now as follows:&lt;BR /&gt;NB1 gets the tenant t-shirt size and sets the cluster_id for each size as a variable.&lt;BR /&gt;The notebook then loops through each tenant and using the DataBricks API updates the tasks within the job to the right cluster_id and triggers a run of the main job.&lt;/P&gt;&lt;P&gt;After testing (with one tenant as S and one as M), the right job was triggered twice (once for each tenat) and each of those runs ran on the right sized cluster for the tenant in question.&lt;/P&gt;&lt;P&gt;It's just what we were after, thank you so so much for your help!&lt;BR /&gt;Ally&lt;/P&gt;</description>
      <pubDate>Wed, 16 Jul 2025 14:10:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/variable-compute-clusters-within-a-job/m-p/125447#M47447</guid>
      <dc:creator>allyallen</dc:creator>
      <dc:date>2025-07-16T14:10:36Z</dc:date>
    </item>
    <item>
      <title>Re: Variable Compute clusters within a Job</title>
      <link>https://community.databricks.com/t5/data-engineering/variable-compute-clusters-within-a-job/m-p/125465#M47453</link>
      <description>&lt;P&gt;Fantastic, I'm glad to hear it worked!&amp;nbsp;&lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 16 Jul 2025 15:33:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/variable-compute-clusters-within-a-job/m-p/125465#M47453</guid>
      <dc:creator>eniwoke</dc:creator>
      <dc:date>2025-07-16T15:33:02Z</dc:date>
    </item>
  </channel>
</rss>

