<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic for_each_task with pool clusters in Administration &amp; Architecture</title>
    <link>https://community.databricks.com/t5/administration-architecture/for-each-task-with-pool-clusters/m-p/116494#M3282</link>
    <description>&lt;P&gt;I am trying to run a `for_each_task` across different inputs of length `N` and `concurrency` `M` where N &amp;gt;&amp;gt; M.&amp;nbsp; To mitigate cluster setup time I want to use pool clusters.&lt;/P&gt;&lt;P&gt;Now, when I set everything up, I notice that instead of `M` concurrent clusters, only a single pool cluster instance is created that is used across all M jobs.&lt;/P&gt;&lt;P&gt;Is there a way to tackle this, or does for_each_task not support cluster pools?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 24 Apr 2025 15:46:48 GMT</pubDate>
    <dc:creator>david_btmpl</dc:creator>
    <dc:date>2025-04-24T15:46:48Z</dc:date>
    <item>
      <title>for_each_task with pool clusters</title>
      <link>https://community.databricks.com/t5/administration-architecture/for-each-task-with-pool-clusters/m-p/116494#M3282</link>
      <description>&lt;P&gt;I am trying to run a `for_each_task` across different inputs of length `N` and `concurrency` `M` where N &amp;gt;&amp;gt; M.&amp;nbsp; To mitigate cluster setup time I want to use pool clusters.&lt;/P&gt;&lt;P&gt;Now, when I set everything up, I notice that instead of `M` concurrent clusters, only a single pool cluster instance is created that is used across all M jobs.&lt;/P&gt;&lt;P&gt;Is there a way to tackle this, or does for_each_task not support cluster pools?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 24 Apr 2025 15:46:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/for-each-task-with-pool-clusters/m-p/116494#M3282</guid>
      <dc:creator>david_btmpl</dc:creator>
      <dc:date>2025-04-24T15:46:48Z</dc:date>
    </item>
    <item>
      <title>Re: for_each_task with pool clusters</title>
      <link>https://community.databricks.com/t5/administration-architecture/for-each-task-with-pool-clusters/m-p/116558#M3287</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/161639"&gt;@david_btmpl&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;When you set up a Databricks workflow using for_each_task with a cluster pool (instance_pool_id), Databricks will, by default, reuse the same cluster for all concurrent tasks in that job. So even if you’ve set a higher concurrency (like M &amp;gt; 1), all those tasks will still run on a single shared cluster.&lt;/P&gt;&lt;P&gt;If your goal is to have M separate clusters running at the same time, you’ll need to configure each task (or job) with its own new_cluster block, all pointing to the same instance pool. This approach gives you the cluster-level concurrency you’re looking for, while still benefiting from the reduced startup time that pools provide.&lt;/P&gt;</description>
      <pubDate>Fri, 25 Apr 2025 10:18:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/for-each-task-with-pool-clusters/m-p/116558#M3287</guid>
      <dc:creator>SP_6721</dc:creator>
      <dc:date>2025-04-25T10:18:29Z</dc:date>
    </item>
  </channel>
</rss>

