cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

how can I verify that the result of a dlt will have enough rows before updating the table?

yuinagam
New Contributor II

I have a dlt/lakeflow pipeline that creates a table, and I need to make sure that it will only update the resulting materialized view if it will have more than one million records.

I've found this, but it seems to only work if I have already updated the table that I want to validate and want to validate it after with a separate job. this wouldn't work for me because I need to ensure that at no point the table will have too few rows. when I tried it with a single pipeline (creating a temporary version of the table, verifying that temporary table, and if the test passed creating the final table) I encountered a problem where `dlt.read("table_name").count()` always equals zero, even if when the table is created I can count it's rows and get more.

I've also tried just using `count(1)` in the `dlt.expect_or_fail` decorator but that always results in an error and doesn't seem to be supported.

 

In general the question would be how can I verify conditions that involve aggregation over the data in a dlt pipeline, and only apply the update if the verification succeeded?

2 REPLIES 2

mariadawson
New Contributor III

Currently, DLT doesn’t natively support applying expectations or conditional logic based on aggregate metrics like row count within a single pipeline step. That’s why `dlt.expect_or_fail` and trying to count rows within DLT tables doesn’t work as expected.

yuinagam
New Contributor II

Thank you for the quick reply.

Is there a common/recommended/possible way to work around this limitation? I don't mind not using the expectation api if it doesn't support logic that's based on aggregations.