Conditional Execution in DLT Pipeline based on the output

rpilli — Sun, 25 Aug 2024 21:35:47 GMT

Hello ,

I'm working on a Delta Live Tables (DLT) pipeline where I need to implement a conditional step that only triggers under specific conditions. Here's the challenge I'm facing:

I have a function that checks if the data meets certain thresholds. If the data passes, the function returns the original DataFrame. If the thresholds are breached, it returns a different DataFrame containing logs or failure statistics.
The main challenge is that these two DataFrames have different schemas, which is causing difficulties in the pipeline
I only want to initiate the logs-related steps in DLT when the threshold check fails. Currently, my pipeline writes two outputs regardless: one for the logs and one for the passed data, even if the passed data count is zero. This isn't the behavior I want.

Question:
Is there a way to structure the DLT pipeline so that the logs path graph or process, is only initiated when the threshold check fails? Ideally, l'm looking for a nested or conditional DLT step that only runs when the threshold validation fails.

The fact that DLT doesn't have built-in flow control mechanism like ETL tools, is challenging.

Any guidance or best practices for achieving this would be greatly appreciated!
Thanks in advance for your help!

Re: Conditional Execution in DLT Pipeline based on the output

mark_ott — Sun, 16 Nov 2025 17:48:23 GMT

In Delta Live Tables (DLT), native conditional or branch-based control flow is limited; all table/stream definitions declared in your pipeline will execute, and dependencies are handled via @Dlt.table or @Dlt.view decorators. You can’t dynamically skip or instantiate pipeline steps based purely on DataFrame content as you might in classic ETL tools—but there are best-practice patterns to approximate conditional logic for cases like threshold-based validation, especially with differing schemas.

Core Recommendation

You should separate your validation check from your log/error handling steps, and use filtering and table dependencies to "gate" data to each path, rather than trying to switch between DataFrames with differing schemas in the same step.

Pattern: Split Tables and Data Filters

Validation Table:
Create a DLT table or view that applies your threshold check and only outputs rows passing the check.
Failure Logs Table:
Create a separate DLT table or view that selects only rows failing your check, and transforms them into your failure schema. You can derive this from the same base input (or from the validation table, applying the inverse filter).
Downstream Consumption:
Downstream steps will only see non-empty tables if the corresponding condition occurs. You can avoid writing zero-row tables by adding checks or limits in your final write logic, though DLT will still instantiate both tables as part of the pipeline graph.

Example:

python

import dlt @Dlt.table def raw_data(): return spark.read.table("source_table") @Dlt.table def passed_data(): df = dlt.read("raw_data") return df.filter(df.metric > threshold) @Dlt.table def failed_logs(): df = dlt.read("raw_data") failed = df.filter(df.metric <= threshold) # Map failure schema return failed.selectExpr("id", "metric", "logMessage")

This way, only the appropriate table receives data per run, but both destinations are defined in your graph.

Avoiding Zero-Row Tables

Optional: Use a final step to avoid writing downstream if a table has no data. DLT will materialize the step, but nothing will be written or processed downstream if the table/view is empty.
This does not prevent DLT from managing the table, but can keep your metrics cleaner.

Why Not Dynamic Branching?

DLT's declarative framework builds graphs at pipeline start time—you can't declare pipeline steps dynamically. However, by using DataFrame filters and data-dependent table population, you mimic conditional behavior within the limitations of the framework.

Handling Different Schemas

Never return different schema DataFrames from the same output step. Instead, output to separately-defined tables/views, each with a fixed schema, so DLT's schema expectations are satisfied. This keeps the pipeline reliable and clear.

Summary Table

Requirement	DLT Solution	Notes
Conditional logic	Data filters	Split via filtered tables/views
Different schemas	Distinct tables/views	Never change schema mid-pipeline
Output only logs on fail	Logs table populated only if failed rows exist	DLT materializes the table regardless; avoid zero-result processing downstream

References

Databricks forums and official docs recommend this pattern for validations, error handling, and conditional branches in DLT.

If you need nested conditional logic or true "branching," you might need to orchestrate pipelines via external tools (Databricks Jobs, Airflow, etc.) or preprocess thresholds before ingesting to DLT. For most operational use cases, filtering/gating via tables as described above is the validated approach.

topic Re: Conditional Execution in DLT Pipeline based on the output in Data Engineering