cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Lakeflow Declarative Pipeline queue

tt_921
New Contributor II

In the January 2026 release notes, it was announced that: "Pipelines now support queued execution mode, where multiple update requests are automatically queued and executed sequentially instead of failing with conflicts. This simplifies operations for pipelines with frequent update triggers and eliminates the need for manual retry coordination."

However, I am still seeing concurrent runs fail with "RUN_EXECUTION_ERROR: Pipeline update already in progress." I also don't see an option to apply this queue setting in the UI, nor any documentation for this within DAB. I tried setting this in DAB with `queue: enabled: true` that is set for a job, but this does not work.

Has the pipeline queue been working for anyone else?

1 ACCEPTED SOLUTION

Accepted Solutions

Lu_Wang_ENB_DBX
Databricks Employee
Databricks Employee

Whatโ€™s going on

  • The January 2026 release note is correct that the engine for Lakeflow Spark Declarative Pipelines supports queued execution, but itโ€™s a backend behavior, not a user-configurable option; there is no UI or DAB field to flip it on or off.
  • Internally, there has been a โ€œqueuing guardrailโ€ and staged rollout work (e.g. discussion on enabling queued update execution and reverting that guardrail only now), so some environments/pipelines still behave as โ€œfail on concurrent StartUpdateโ€ instead of queueing.
  • Separately, there are open/very recent bugs where Jobs sends duplicate StartUpdate requests or loses the response, causing RUN_EXECUTION_ERROR: Pipeline update already in progress even when only one update is actually running; see ES-1635313 and follow-ons, and Nokia BL-16616 / SUP-27441 where the pipeline completes but the job task shows this error.
  • The Jobs queue setting (queue.enabled: true) you tried in DAB is a Jobs-level run queue, not the DLT/Lakeflow pipeline queuing feature, so toggling that wonโ€™t change the pipeline control-plane behavior.

Below are 3 options


Option 1 โ€“ Serialize all updates via one Job (no secondary triggers)

Mitigate by ensuring there is exactly one place that can trigger the pipeline and that it never overlaps runs:

  • Use a single Lakeflow Job with one pipeline task, and set either:
    • max_concurrent_runs = 1, or
    • in DAB, on that same job:
      jobs:
        my_pipeline_job:
          queue:
            enabled: true
      
  • Remove/avoid: manual โ€œStartโ€ in the pipeline UI, other Jobs, or ad-hoc API calls that can also call StartUpdate.

This wonโ€™t fix the known Jobs bug cases where one job run issues duplicate StartUpdate, but it does remove most genuine concurrency conflicts.


Option 2 โ€“ Treat as known Jobs / pipeline integration bug and add retries

If your pattern is โ€œjob task sometimes fails with RUN_EXECUTION_ERROR but the underlying pipeline update actually succeededโ€ (as in ES-1635313 / Nokia cases), treat this as a transient integration bug:

  • In DAB, add a small max_retries on the pipeline task itself so the job auto-retries when it sees this specific error.
  • Operationally treat these failures as โ€œfalse negativesโ€ for now (confirm via go/dlt/debug / event log that only one update ran and completed).

This doesnโ€™t give you true queuing semantics, but it makes the symptom operationally tolerable until the backend rollout and Jobs fixes are fully in place.


Option 3 โ€“ Escalate via Support/ES to get attached to existing incidents (Recommended)

Given youโ€™re still seeing RUN_EXECUTION_ERROR post-announcement and there are active incidents (ES-1635313, BL-16616/SUP-27441) specifically around this error and queued/duplicate StartUpdate behavior:

  • Open a Support ticket (or ES via Support) with:
    • Workspace ID, pipeline ID, and a few failing Job run URLs.
    • Note that this is โ€œLakeflow Pipelines + Lakeflow Jobs: RUN_EXECUTION_ERROR: Pipeline update already in progress despite queued execution being announced; please check against ES-1635313 / BL-16616 behavior.โ€
  • In parallel, apply Option 1 (single orchestrating job, max_concurrent_runs or job queue) and, if needed, Option 2 (retries) as immediate mitigations.

Recommendation:
Use Option 3 as the primary path so engineering can confirm whether your workspace/pipelines are on the queued-execution rollout and attach you to the ongoing fixes, and in the meantime implement Option 1 (single orchestrator + serialization) plus light retries from Option 2 to reduce operational pain.

View solution in original post

2 REPLIES 2

Lu_Wang_ENB_DBX
Databricks Employee
Databricks Employee

Whatโ€™s going on

  • The January 2026 release note is correct that the engine for Lakeflow Spark Declarative Pipelines supports queued execution, but itโ€™s a backend behavior, not a user-configurable option; there is no UI or DAB field to flip it on or off.
  • Internally, there has been a โ€œqueuing guardrailโ€ and staged rollout work (e.g. discussion on enabling queued update execution and reverting that guardrail only now), so some environments/pipelines still behave as โ€œfail on concurrent StartUpdateโ€ instead of queueing.
  • Separately, there are open/very recent bugs where Jobs sends duplicate StartUpdate requests or loses the response, causing RUN_EXECUTION_ERROR: Pipeline update already in progress even when only one update is actually running; see ES-1635313 and follow-ons, and Nokia BL-16616 / SUP-27441 where the pipeline completes but the job task shows this error.
  • The Jobs queue setting (queue.enabled: true) you tried in DAB is a Jobs-level run queue, not the DLT/Lakeflow pipeline queuing feature, so toggling that wonโ€™t change the pipeline control-plane behavior.

Below are 3 options


Option 1 โ€“ Serialize all updates via one Job (no secondary triggers)

Mitigate by ensuring there is exactly one place that can trigger the pipeline and that it never overlaps runs:

  • Use a single Lakeflow Job with one pipeline task, and set either:
    • max_concurrent_runs = 1, or
    • in DAB, on that same job:
      jobs:
        my_pipeline_job:
          queue:
            enabled: true
      
  • Remove/avoid: manual โ€œStartโ€ in the pipeline UI, other Jobs, or ad-hoc API calls that can also call StartUpdate.

This wonโ€™t fix the known Jobs bug cases where one job run issues duplicate StartUpdate, but it does remove most genuine concurrency conflicts.


Option 2 โ€“ Treat as known Jobs / pipeline integration bug and add retries

If your pattern is โ€œjob task sometimes fails with RUN_EXECUTION_ERROR but the underlying pipeline update actually succeededโ€ (as in ES-1635313 / Nokia cases), treat this as a transient integration bug:

  • In DAB, add a small max_retries on the pipeline task itself so the job auto-retries when it sees this specific error.
  • Operationally treat these failures as โ€œfalse negativesโ€ for now (confirm via go/dlt/debug / event log that only one update ran and completed).

This doesnโ€™t give you true queuing semantics, but it makes the symptom operationally tolerable until the backend rollout and Jobs fixes are fully in place.


Option 3 โ€“ Escalate via Support/ES to get attached to existing incidents (Recommended)

Given youโ€™re still seeing RUN_EXECUTION_ERROR post-announcement and there are active incidents (ES-1635313, BL-16616/SUP-27441) specifically around this error and queued/duplicate StartUpdate behavior:

  • Open a Support ticket (or ES via Support) with:
    • Workspace ID, pipeline ID, and a few failing Job run URLs.
    • Note that this is โ€œLakeflow Pipelines + Lakeflow Jobs: RUN_EXECUTION_ERROR: Pipeline update already in progress despite queued execution being announced; please check against ES-1635313 / BL-16616 behavior.โ€
  • In parallel, apply Option 1 (single orchestrating job, max_concurrent_runs or job queue) and, if needed, Option 2 (retries) as immediate mitigations.

Recommendation:
Use Option 3 as the primary path so engineering can confirm whether your workspace/pipelines are on the queued-execution rollout and attach you to the ongoing fixes, and in the meantime implement Option 1 (single orchestrator + serialization) plus light retries from Option 2 to reduce operational pain.

tt_921
New Contributor II

Thank you very much for the detailed response! We unfortunately can't proceed with option 1, as we do require multiple places that can trigger the pipeline (an API call to the parent job, and a direct API call to the pipeline itself). This is due to the specific configurable options available in a pipeline API call vs a job API call, namely `full_refresh_selection` to fully refresh specific tables. 

We do have queue enabled at the job level and a small `max_retries` on the pipeline.

For now it seems we will need to open a ticket and wait until the pipeline execution queue is fully rolled out.