cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Delta Live Tables @expect compare tables count between two stages

140015
New Contributor III

Hello,

I'm wondering if there is an option to make an expectation on DLT that will compare the number of records between two stages and e.g. fail if there is a difference between those counts?

I mean something like this:

@dlt.table()

def bronze():

Some transformations

@dlt.expect_or_fail("equal_number_of_records", "bronze_table.count() == silver_table.count()"

@dlt.table()

def silver():

Some transformations

6 REPLIES 6

-werners-
Esteemed Contributor III

What if you add a count("*") column to silver, and compare that to the count of bronze (which you put into a variable first)? Like that you compare a column to a scalar value, which I believe will work.

140015
New Contributor III

Unfortunately, it didnt work that way, expect don't see scalar values saved as variables

@dlt.expect_or_fail("equal_number_of_records", "qa_silver_row_count == bronze_count")

@dlt.table()

def silver():

bronze_count == bronze_table.count()

silver_table = # transformations on bronze table

silver_table = silver_table.withColumn("qa_silver_row_count", F.lit(silver_table.count()))

The way I manage to make it run is:

@dlt.expect_or_fail("equal_number_of_records", "qa_silver_row_count == qa_bronze_row_count")

@dlt.table()

def silver():

bronze_table = #load bronze table

   silver_table = upstream_table.withColumn("qa_bronze_row_count", F.lit(bronze_table..count()))

silver_table = # transformations on bronze table

silver_table = silver_table.withColumn("qa_silver_row_count", F.lit(silver_table.count()))

It is a little bit cumbersome. It is a little annoying that DLT actually makes row count automatically but it can't be easily accessed. Maybe it is possible to get this data from the event log table I will try to find out

140015
New Contributor III

I dig into DLT docs and found this https://docs.databricks.com/workflows/delta-live-tables/delta-live-tables-cookbook.html#validate-row... . I guess it solves my problem.

-werners-
Esteemed Contributor III

great find!

Hubert-Dudek
Esteemed Contributor III

The problem is that expectations are deterministic, so SQL query in "expect" needs always to give the same result (for that reason, it is not possible to add that timestamp is from the last 60 minutes in "expect", but the timestamp may be greater than the hardcoded date). That's why it is proposed to create an additional table for checks instead of using "expect".

Anonymous
Not applicable

Hi @Jacek Dembowiakโ€‹ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group