for_each_task with pool clusters

david_btmpl — Thu, 24 Apr 2025 15:46:48 GMT

I am trying to run a `for_each_task` across different inputs of length `N` and `concurrency` `M` where N >> M. To mitigate cluster setup time I want to use pool clusters.

Now, when I set everything up, I notice that instead of `M` concurrent clusters, only a single pool cluster instance is created that is used across all M jobs.

Is there a way to tackle this, or does for_each_task not support cluster pools?

Re: for_each_task with pool clusters

SP_6721 — Fri, 25 Apr 2025 10:18:29 GMT

Hi @david_btmpl

When you set up a Databricks workflow using for_each_task with a cluster pool (instance_pool_id), Databricks will, by default, reuse the same cluster for all concurrent tasks in that job. So even if you’ve set a higher concurrency (like M > 1), all those tasks will still run on a single shared cluster.

If your goal is to have M separate clusters running at the same time, you’ll need to configure each task (or job) with its own new_cluster block, all pointing to the same instance pool. This approach gives you the cluster-level concurrency you’re looking for, while still benefiting from the reduced startup time that pools provide.

topic for_each_task with pool clusters in Administration & Architecture

for_each_task with pool clusters

Re: for_each_task with pool clusters