cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

for_each_task with pool clusters

david_btmpl
New Contributor II

I am trying to run a `for_each_task` across different inputs of length `N` and `concurrency` `M` where N >> M.  To mitigate cluster setup time I want to use pool clusters.

Now, when I set everything up, I notice that instead of `M` concurrent clusters, only a single pool cluster instance is created that is used across all M jobs.

Is there a way to tackle this, or does for_each_task not support cluster pools?

 

1 REPLY 1

SP_6721
Contributor

Hi @david_btmpl 

When you set up a Databricks workflow using for_each_task with a cluster pool (instance_pool_id), Databricks will, by default, reuse the same cluster for all concurrent tasks in that job. So even if youโ€™ve set a higher concurrency (like M > 1), all those tasks will still run on a single shared cluster.

If your goal is to have M separate clusters running at the same time, youโ€™ll need to configure each task (or job) with its own new_cluster block, all pointing to the same instance pool. This approach gives you the cluster-level concurrency youโ€™re looking for, while still benefiting from the reduced startup time that pools provide.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now