<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: how can I verify that the result of a dlt will have enough rows before updating the table? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-can-i-verify-that-the-result-of-a-dlt-will-have-enough-rows/m-p/126168#M47651</link>
    <description>&lt;P&gt;Thank you for the quick reply.&lt;/P&gt;&lt;P&gt;Is there a common/recommended/possible way to work around this limitation? I don't mind not using the expectation api if it doesn't support logic that's based on aggregations.&lt;/P&gt;</description>
    <pubDate>Wed, 23 Jul 2025 11:44:49 GMT</pubDate>
    <dc:creator>yuinagam</dc:creator>
    <dc:date>2025-07-23T11:44:49Z</dc:date>
    <item>
      <title>how can I verify that the result of a dlt will have enough rows before updating the table?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-can-i-verify-that-the-result-of-a-dlt-will-have-enough-rows/m-p/126157#M47647</link>
      <description>&lt;P&gt;I have a dlt/lakeflow pipeline that creates a table, and I need to make sure that it will only update the resulting materialized view if it will have more than one million records.&lt;/P&gt;&lt;P&gt;I've found &lt;A href="https://docs.databricks.com/aws/en/dlt/expectation-patterns?language=Python#row-count-validation" target="_blank"&gt;this&lt;/A&gt;, but it seems to only work if I have already updated the table that I want to validate and want to validate it after with a separate job. this wouldn't work for me because I need to ensure that at no point the table will have too few rows. when I tried it with a single pipeline (creating a temporary version of the table, verifying that temporary table, and if the test passed creating the final table) I encountered a problem where `dlt.read("table_name").count()` always equals zero, even if when the table is created I can count it's rows and get more.&lt;/P&gt;&lt;P&gt;I've also tried just using `count(1)` in the `dlt.expect_or_fail` decorator but that always results in an error and doesn't seem to be supported.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In general the question would be how can I verify conditions that involve aggregation over the data in a dlt pipeline, and only apply the update if the verification succeeded?&lt;/P&gt;</description>
      <pubDate>Wed, 23 Jul 2025 11:30:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-can-i-verify-that-the-result-of-a-dlt-will-have-enough-rows/m-p/126157#M47647</guid>
      <dc:creator>yuinagam</dc:creator>
      <dc:date>2025-07-23T11:30:16Z</dc:date>
    </item>
    <item>
      <title>Re: how can I verify that the result of a dlt will have enough rows before updating the table?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-can-i-verify-that-the-result-of-a-dlt-will-have-enough-rows/m-p/126161#M47649</link>
      <description>&lt;P&gt;Currently, DLT doesn’t natively support applying expectations or conditional logic based on aggregate metrics like row count within a single pipeline step. That’s why `dlt.expect_or_fail` and trying to count rows within DLT tables doesn’t work as expected.&lt;/P&gt;</description>
      <pubDate>Wed, 23 Jul 2025 11:38:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-can-i-verify-that-the-result-of-a-dlt-will-have-enough-rows/m-p/126161#M47649</guid>
      <dc:creator>mariadawson</dc:creator>
      <dc:date>2025-07-23T11:38:54Z</dc:date>
    </item>
    <item>
      <title>Re: how can I verify that the result of a dlt will have enough rows before updating the table?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-can-i-verify-that-the-result-of-a-dlt-will-have-enough-rows/m-p/126168#M47651</link>
      <description>&lt;P&gt;Thank you for the quick reply.&lt;/P&gt;&lt;P&gt;Is there a common/recommended/possible way to work around this limitation? I don't mind not using the expectation api if it doesn't support logic that's based on aggregations.&lt;/P&gt;</description>
      <pubDate>Wed, 23 Jul 2025 11:44:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-can-i-verify-that-the-result-of-a-dlt-will-have-enough-rows/m-p/126168#M47651</guid>
      <dc:creator>yuinagam</dc:creator>
      <dc:date>2025-07-23T11:44:49Z</dc:date>
    </item>
  </channel>
</rss>

