cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

A Job "pool"? (or task pool)

spott_submittab
New Contributor II

I'm trying to run a single job multiple times with different parameters where the number of concurrent jobs is less than the number of parameters.

I have a job (or task...) J that takes parameter set p, I have 100 p values I want to run, however I only want 10 to run at a time (say I have 10 clusters, or I want to run them all on the same cluster and that cluster needs to have enough compute to run them concurrently, etc.), however I want all 100 to eventually run.

Is this possible? The maximum concurrent runs parameter just skips extra jobs and tasks without dependencies all run at the same time.

I'm aware that there is frequently a way to do something like this is spark, but that feels heavy handed (and requires rearchitecting my code).

1 REPLY 1

Aviral-Bhardwaj
Esteemed Contributor III

this is something new,interesting question, try to reach out databricks support team, maybe they have some good idea here

AviralBhardwaj

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group