06-19-2025 06:05 AM
I have a question about how expectations work when applied to views inside a Delta Live Tables (DLT) pipeline. For instance, suppose we define this view inside a pipeline to stop the pipeline if we spot some duplicates:
@Dlt.view(
name=view_name,
)
@dlt.expect_or_fail(expectation, "duplicate = 1")
def generate_report():
return (
dlt.read(table)
.groupBy(keys)
.agg(count("*").alias("duplicate"))
)
What is happening behind the scenes? Is the expectation validated as a "temporary table" (computed but not stored in the catalog), or is it validated only if the view is used by a table defined in the pipeline?
06-19-2025 08:02 AM
Hello mai_luca
The expectation in your @Dlt.expect_or_fail() is only evaluated if the view is used downstream by a materialized table. If the view is only referenced by another view, or not used at all, the expectation will not be evaluated. DLT will not materialize the view, and the check will be skipped entirely.
Behind the scenes, here's what happens:
1. Views in DLT are logical constructs. They are not stored in the metastore or physically persisted.
2. DLT evaluates views lazily. This means a view is only computed if it is referenced by something downstream, such as a table.
3. If an expectation is attached to a view, it is only applied when that view is actually evaluated.
So, if no downstream @Dlt.table consumes 'view_name', the expectation will not run, and the pipeline will not fail, even if the data violates the condition.
Best,
Ilir
06-19-2025 07:58 AM
Also noticed that in Expectation recommendations and advanced patterns | Databricks Documentation, it is suggested to use @Dlt.view with Python and CREATE OR REFRESH MATERIALIZED VIEW with SQL, which confuses me...
06-19-2025 08:02 AM
Hello mai_luca
The expectation in your @Dlt.expect_or_fail() is only evaluated if the view is used downstream by a materialized table. If the view is only referenced by another view, or not used at all, the expectation will not be evaluated. DLT will not materialize the view, and the check will be skipped entirely.
Behind the scenes, here's what happens:
1. Views in DLT are logical constructs. They are not stored in the metastore or physically persisted.
2. DLT evaluates views lazily. This means a view is only computed if it is referenced by something downstream, such as a table.
3. If an expectation is attached to a view, it is only applied when that view is actually evaluated.
So, if no downstream @Dlt.table consumes 'view_name', the expectation will not run, and the pipeline will not fail, even if the data violates the condition.
Best,
Ilir
06-19-2025 08:19 AM
@ilir_nuredini It totally makes sense. Thanks to confirm it. I would say that the best way to validate tables with expectations would be to use private materialized views and not views as it suggested in the link.
Note. Private materialized views were previously created with the TEMPORARY parameter). CREATE MATERIALIZED VIEW (Lakeflow Declarative Pipelines) | Databricks Documentation
06-19-2025 08:26 AM
That is right, and thank you. It would be great if you can mark as the right answer the reply solution so future colleagues can have this article as their reference. Best, Ilir
06-19-2025 09:38 AM
In DLT, expectations defined with dlt.expect_or_fail() on views are only evaluated if the view is used downstream by a materialized table. Since views are logical and lazily evaluated, if no table depends on the view, the expectation is skipped and the pipeline won’t fail—even if the data violates the condition.
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now