cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Delta Live Tables @expect compare tables count between two stages

140015
New Contributor III

Hello,

I'm wondering if there is an option to make an expectation on DLT that will compare the number of records between two stages and e.g. fail if there is a difference between those counts?

I mean something like this:

@dlt.table()

def bronze():

Some transformations

@dlt.expect_or_fail("equal_number_of_records", "bronze_table.count() == silver_table.count()"

@dlt.table()

def silver():

Some transformations

6 REPLIES 6

-werners-
Esteemed Contributor III

What if you add a count("*") column to silver, and compare that to the count of bronze (which you put into a variable first)? Like that you compare a column to a scalar value, which I believe will work.

140015
New Contributor III

Unfortunately, it didnt work that way, expect don't see scalar values saved as variables

@dlt.expect_or_fail("equal_number_of_records", "qa_silver_row_count == bronze_count")

@dlt.table()

def silver():

bronze_count == bronze_table.count()

silver_table = # transformations on bronze table

silver_table = silver_table.withColumn("qa_silver_row_count", F.lit(silver_table.count()))

The way I manage to make it run is:

@dlt.expect_or_fail("equal_number_of_records", "qa_silver_row_count == qa_bronze_row_count")

@dlt.table()

def silver():

bronze_table = #load bronze table

   silver_table = upstream_table.withColumn("qa_bronze_row_count", F.lit(bronze_table..count()))

silver_table = # transformations on bronze table

silver_table = silver_table.withColumn("qa_silver_row_count", F.lit(silver_table.count()))

It is a little bit cumbersome. It is a little annoying that DLT actually makes row count automatically but it can't be easily accessed. Maybe it is possible to get this data from the event log table I will try to find out

140015
New Contributor III

I dig into DLT docs and found this https://docs.databricks.com/workflows/delta-live-tables/delta-live-tables-cookbook.html#validate-row... . I guess it solves my problem.

-werners-
Esteemed Contributor III

great find!

Hubert-Dudek
Esteemed Contributor III

The problem is that expectations are deterministic, so SQL query in "expect" needs always to give the same result (for that reason, it is not possible to add that timestamp is from the last 60 minutes in "expect", but the timestamp may be greater than the hardcoded date). That's why it is proposed to create an additional table for checks instead of using "expect".

Anonymous
Not applicable

Hi @Jacek Dembowiak​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.