cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Validation with views - Dlt pipeline expectations

mai_luca
New Contributor III

I have a question about how expectations work when applied to views inside a Delta Live Tables (DLT) pipeline. For instance, suppose we define this view inside a pipeline to stop the pipeline if we spot some duplicates:

@Dlt.view(
    name=view_name,
)
@dlt.expect_or_fail(expectation, "duplicate = 1")
def generate_report():
    return (
        dlt.read(table)
        .groupBy(keys)
        .agg(count("*").alias("duplicate"))
    )

What is happening behind the scenes? Is the expectation validated as a "temporary table" (computed but not stored in the catalog), or is it validated only if the view is used by a table defined in the pipeline?

1 ACCEPTED SOLUTION

Accepted Solutions

ilir_nuredini
Honored Contributor

Hello mai_luca

The expectation in your @Dlt.expect_or_fail() is only evaluated if the view is used downstream by a materialized table. If the view is only referenced by another view, or not used at all, the expectation will not be evaluated. DLT will not materialize the view, and the check will be skipped entirely.

Behind the scenes, here's what happens:

1. Views in DLT are logical constructs. They are not stored in the metastore or physically persisted.

2. DLT evaluates views lazily. This means a view is only computed if it is referenced by something downstream, such as a table.

3. If an expectation is attached to a view, it is only applied when that view is actually evaluated.

So, if no downstream @Dlt.table consumes 'view_name', the expectation will not run, and the pipeline will not fail, even if the data violates the condition.


Best,
Ilir

View solution in original post

5 REPLIES 5

mai_luca
New Contributor III

Also noticed that in Expectation recommendations and advanced patterns | Databricks Documentation, it is suggested to use @Dlt.view with Python and CREATE OR REFRESH MATERIALIZED VIEW with SQL, which confuses me...

ilir_nuredini
Honored Contributor

Hello mai_luca

The expectation in your @Dlt.expect_or_fail() is only evaluated if the view is used downstream by a materialized table. If the view is only referenced by another view, or not used at all, the expectation will not be evaluated. DLT will not materialize the view, and the check will be skipped entirely.

Behind the scenes, here's what happens:

1. Views in DLT are logical constructs. They are not stored in the metastore or physically persisted.

2. DLT evaluates views lazily. This means a view is only computed if it is referenced by something downstream, such as a table.

3. If an expectation is attached to a view, it is only applied when that view is actually evaluated.

So, if no downstream @Dlt.table consumes 'view_name', the expectation will not run, and the pipeline will not fail, even if the data violates the condition.


Best,
Ilir

mai_luca
New Contributor III

@ilir_nuredini It totally makes sense. Thanks to confirm it. I would say that the best way to validate tables with expectations would be to use private materialized views and not views as it suggested in the link.

Note. Private materialized views were previously created with the TEMPORARY parameter). CREATE MATERIALIZED VIEW (Lakeflow Declarative Pipelines) | Databricks Documentation

That is right, and thank you. It would be great if you can mark as the right answer the reply solution so future colleagues can have this article as their reference. Best, Ilir

Yogesh_378691
Contributor

In DLT, expectations defined with dlt.expect_or_fail() on views are only evaluated if the view is used downstream by a materialized table. Since views are logical and lazily evaluated, if no table depends on the view, the expectation is skipped and the pipeline won’t fail—even if the data violates the condition.

Yogesh Verma

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now