Hi @tgburrin-afs, @Mounika_Tarigop ,
As I understand the question is about running concurrent tasks within a single job rather than running concurrent jobs.
max_concurrent_runs controls how many times a whole job can run simultaneously, not the concurrency of tasks within a single job run.
There is currently no direct feature in Databricks Jobs to specify a maximum number of concurrently running tasks within a single job run. Instead, you need to control concurrency through task dependencies or application logic.
Approaches to Limit Concurrent Tasks Within a Single Job Run
- Use Task Dependencies to Limit Parallelism:
Structure your job so that no more than three tasks run at the same "layer." For example:- Suppose you have 12 tasks total. Instead of having all 12 start at once, arrange them in four "waves" of three tasks each.
- In the Job UI or JSON configuration:
- Start with three tasks (A, B, C) that have no upstream dependencies. They run simultaneously.
- The next set of three tasks (D, E, F) only start after A, B, and C all complete.
- Repeat this pattern until all tasks have run. This ensures that at most three tasks are active at the same time.
Implement Concurrency Control in Your Code:
If each Databricks task itself runs code that can executes operations against the external system, implement concurrency control in your code logic.
For example, if a single task processes multiple items and you donโt want more than three operations to hit the external system concurrently, use a thread pool within the taskโs code:
from concurrent.futures import ThreadPoolExecutor
# Limit to 3 concurrent operations
with ThreadPoolExecutor(max_workers=3) as executor:
futures = [executor.submit(process_item, item) for item in items_to_process]
results = [f.result() for f in futures]This approach requires merging multiple pieces of logic into a single task and controlling concurrency at the code level.
- This approach requires merging multiple pieces of logic into a single task and controlling concurrency at the code level.