cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to use dlt.expect to validate table level constraints?

guangyi
Contributor

I know how to validate the column level constraint, like checking whether the specified column value is larger than target value.

Can I validate some table level constraints? For example, validate whether the total records count of a table is larger than the specified number. Or validate the foreign key between two tables.

I tried this way but it is now working:

@Dlt.expect("valid records count", "count('*') > 7500000")

I also tried to do `expect` inside the DLT pipeline function. The function can run successfully but I cannot find the result in the quality tab or the in event logs

@Dlt.table
def bronze_table():
  df = spark.read.table("samples.tpch.orders")
  dlt.expect("valid records count", df.count() > 7500000)
  return df

Can I achieve this with dlt.expect?

1 ACCEPTED SOLUTION

Accepted Solutions

szymon_dybczak
Contributor

Hi @guangyi ,

Unfortunately, there is no out of the box solution for this requirement in dlt. But as a workaround you can add an additional view/table to your pipeline that defines an expectation in similar way to below:

 

CREATE OR REFRESH MATERIALIZED VIEW count_verification(
  CONSTRAINT no_rows_dropped EXPECT (a_count == your_numbers)
) AS SELECT * FROM
  (SELECT COUNT(*) AS a_count FROM LIVE.tbla)

 

The results of this expectation appear in the event log and the Delta Live Tables UI. 

Manage data quality with Delta Live Tables - Azure Databricks | Microsoft Learn

View solution in original post

2 REPLIES 2

szymon_dybczak
Contributor

Hi @guangyi ,

Unfortunately, there is no out of the box solution for this requirement in dlt. But as a workaround you can add an additional view/table to your pipeline that defines an expectation in similar way to below:

 

CREATE OR REFRESH MATERIALIZED VIEW count_verification(
  CONSTRAINT no_rows_dropped EXPECT (a_count == your_numbers)
) AS SELECT * FROM
  (SELECT COUNT(*) AS a_count FROM LIVE.tbla)

 

The results of this expectation appear in the event log and the Delta Live Tables UI. 

Manage data quality with Delta Live Tables - Azure Databricks | Microsoft Learn

Thank you

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group