Databricks Community

david_btmpl · ‎04-24-2025

I am trying to run a `for_each_task` across different inputs of length `N` and `concurrency` `M` where N >> M. To mitigate cluster setup time I want to use pool clusters.

Now, when I set everything up, I notice that instead of `M` concurrent clusters, only a single pool cluster instance is created that is used across all M jobs.

Is there a way to tackle this, or does for_each_task not support cluster pools?

SP_6721 · ‎04-25-2025

Hi @david_btmpl

When you set up a Databricks workflow using for_each_task with a cluster pool (instance_pool_id), Databricks will, by default, reuse the same cluster for all concurrent tasks in that job. So even if you’ve set a higher concurrency (like M > 1), all those tasks will still run on a single shared cluster.

If your goal is to have M separate clusters running at the same time, you’ll need to configure each task (or job) with its own new_cluster block, all pointing to the same instance pool. This approach gives you the cluster-level concurrency you’re looking for, while still benefiting from the reduced startup time that pools provide.

Databricks Community

for_each_task with pool clusters

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! November 28 – December 04, 2025

Lakehouse, Lagers & Legends — Bangalore Meetup | December 13

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐