In Delta Live Tables (DLT), native conditional or branch-based control flow is limited; all table/stream definitions declared in your pipeline will execute, and dependencies are handled via @Dlt.table or @Dlt.view decorators. You canโt dynamically skip or instantiate pipeline steps based purely on DataFrame content as you might in classic ETL toolsโbut there are best-practice patterns to approximate conditional logic for cases like threshold-based validation, especially with differing schemas.
Core Recommendation
You should separate your validation check from your log/error handling steps, and use filtering and table dependencies to "gate" data to each path, rather than trying to switch between DataFrames with differing schemas in the same step.
Pattern: Split Tables and Data Filters
-
Validation Table:
Create a DLT table or view that applies your threshold check and only outputs rows passing the check.
-
Failure Logs Table:
Create a separate DLT table or view that selects only rows failing your check, and transforms them into your failure schema. You can derive this from the same base input (or from the validation table, applying the inverse filter).
-
Downstream Consumption:
Downstream steps will only see non-empty tables if the corresponding condition occurs. You can avoid writing zero-row tables by adding checks or limits in your final write logic, though DLT will still instantiate both tables as part of the pipeline graph.
Example:
import dlt
@Dlt.table
def raw_data():
return spark.read.table("source_table")
@Dlt.table
def passed_data():
df = dlt.read("raw_data")
return df.filter(df.metric > threshold)
@Dlt.table
def failed_logs():
df = dlt.read("raw_data")
failed = df.filter(df.metric <= threshold)
# Map failure schema
return failed.selectExpr("id", "metric", "logMessage")
This way, only the appropriate table receives data per run, but both destinations are defined in your graph.
Avoiding Zero-Row Tables
-
Optional: Use a final step to avoid writing downstream if a table has no data. DLT will materialize the step, but nothing will be written or processed downstream if the table/view is empty.
-
This does not prevent DLT from managing the table, but can keep your metrics cleaner.
Why Not Dynamic Branching?
DLT's declarative framework builds graphs at pipeline start timeโyou can't declare pipeline steps dynamically. However, by using DataFrame filters and data-dependent table population, you mimic conditional behavior within the limitations of the framework.
Handling Different Schemas
-
Never return different schema DataFrames from the same output step. Instead, output to separately-defined tables/views, each with a fixed schema, so DLT's schema expectations are satisfied. This keeps the pipeline reliable and clear.
Summary Table
| Requirement |
DLT Solution |
Notes |
| Conditional logic |
Data filters |
Split via filtered tables/views |
| Different schemas |
Distinct tables/views |
Never change schema mid-pipeline |
| Output only logs on fail |
Logs table populated only if failed rows exist |
DLT materializes the table regardless; avoid zero-result processing downstream |
References
-
Databricks forums and official docs recommend this pattern for validations, error handling, and conditional branches in DLT.
If you need nested conditional logic or true "branching," you might need to orchestrate pipelines via external tools (Databricks Jobs, Airflow, etc.) or preprocess thresholds before ingesting to DLT. For most operational use cases, filtering/gating via tables as described above is the validated approach.