cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to limit max concurrent tasks runs in a job?

yit337
Contributor

I read somewhere that there's a max_concurrent_task_runs property, but can't find it anywhere in the docs. So, how to limit the maximum concurrent tasks run in a job?

1 ACCEPTED SOLUTION

Accepted Solutions

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @yit337,

There isnโ€™t a max_concurrent_task_runs setting in Databricks Jobs. The only setting you get is max_concurrent_runs, which limits how many runs of the same job can be active at once, plus a workspace-wide limit of 2000 concurrent task runs. If you need to cap how many tasks from a single run execute in parallel, you currently have to do it yourself...either by structuring the DAG in waves (only N tasks can be runnable at a time) or by adding concurrency control inside the task code (for example, a thread pool with max_workers = N).

If this answer resolves your question, could you mark it as โ€œAccept as Solutionโ€? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

View solution in original post

2 REPLIES 2

amirabedhiafi
New Contributor III

Hello @yit337 !

I don't think there is a job level max_concurrent_task_runs setting for normal DAG tasks.

But there are 2 different concepts:

1. you can limit concurrent runs of the same job

resources:
  jobs:
    my_job:
      name: my_job
      max_concurrent_runs: 1

here you can limit how many runs of the same job can overlap. It does not limit how many tasks run in parallel inside one job run. https://docs.databricks.com/aws/en/jobs/configure-job

2. you can limit parallel iterations of a repeated task and use a For each task and set concurrency:

tasks:
  - task_key: process_items
    for_each_task:
      inputs: '["A", "B", "C", "D"]'
      concurrency: 2
      task:
        task_key: process_one_item
        notebook_task:
          notebook_path: ../src/process_one_item.py

 https://docs.databricks.com/aws/en/dev-tools/bundles/job-task-types

For normal separate tasks in the same job, concurrency is controlled by the DAG since tasks without dependencies can run in parallel and tasks with depends_on run after their dependencies. So to limit parallelism, you can group tasks into waves using dependencies. https://docs.databricks.com/aws/en/jobs/control-flow

If this answer resolves your question, could you please mark it as โ€œAccept as Solutionโ€? It will help other users quickly find the correct fix.

Senior BI/Data Engineer | Microsoft MVP Data Platform | Microsoft MVP Power BI | Power BI Super User | C# Corner MVP

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @yit337,

There isnโ€™t a max_concurrent_task_runs setting in Databricks Jobs. The only setting you get is max_concurrent_runs, which limits how many runs of the same job can be active at once, plus a workspace-wide limit of 2000 concurrent task runs. If you need to cap how many tasks from a single run execute in parallel, you currently have to do it yourself...either by structuring the DAG in waves (only N tasks can be runnable at a time) or by adding concurrency control inside the task code (for example, a thread pool with max_workers = N).

If this answer resolves your question, could you mark it as โ€œAccept as Solutionโ€? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***