<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Delta Live Tables @expect compare tables count between two stages in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/delta-live-tables-expect-compare-tables-count-between-two-stages/m-p/28227#M20050</link>
    <description>&lt;P&gt;The problem is that expectations are deterministic, so SQL query in "expect" needs always to give the same result (for that reason, it is not possible to add that timestamp is from the last 60 minutes in "expect", but the timestamp may be greater than the hardcoded date). That's why it is proposed to create an additional table for checks instead of using "expect".&lt;/P&gt;</description>
    <pubDate>Sun, 16 Oct 2022 16:28:45 GMT</pubDate>
    <dc:creator>Hubert-Dudek</dc:creator>
    <dc:date>2022-10-16T16:28:45Z</dc:date>
    <item>
      <title>Delta Live Tables @expect compare tables count between two stages</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-tables-expect-compare-tables-count-between-two-stages/m-p/28222#M20045</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I'm wondering if there is an option to make an expectation on DLT that will compare the number of records between two stages and e.g. fail if there is a difference between those counts?&lt;/P&gt;&lt;P&gt;I mean something like this:&lt;/P&gt;&lt;P&gt;@dlt.table()&lt;/P&gt;&lt;P&gt;def bronze():&lt;/P&gt;&lt;P&gt;   Some transformations&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;@dlt.expect_or_fail("equal_number_of_records", "bronze_table.count() == silver_table.count()"&lt;/P&gt;&lt;P&gt;@dlt.table()&lt;/P&gt;&lt;P&gt;def silver():&lt;/P&gt;&lt;P&gt;   Some transformations&lt;/P&gt;</description>
      <pubDate>Mon, 10 Oct 2022 08:49:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-tables-expect-compare-tables-count-between-two-stages/m-p/28222#M20045</guid>
      <dc:creator>140015</dc:creator>
      <dc:date>2022-10-10T08:49:43Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Live Tables @expect compare tables count between two stages</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-tables-expect-compare-tables-count-between-two-stages/m-p/28223#M20046</link>
      <description>&lt;P&gt;What if you add a count("*") column to silver, and compare that to the count of bronze (which you put into a variable first)?  Like that you compare a column to a scalar value, which I believe will work.&lt;/P&gt;</description>
      <pubDate>Mon, 10 Oct 2022 11:28:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-tables-expect-compare-tables-count-between-two-stages/m-p/28223#M20046</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-10-10T11:28:34Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Live Tables @expect compare tables count between two stages</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-tables-expect-compare-tables-count-between-two-stages/m-p/28224#M20047</link>
      <description>&lt;P&gt;Unfortunately, it didnt work that way, expect don't see scalar values saved as variables &lt;/P&gt;&lt;P&gt;@dlt.expect_or_fail("equal_number_of_records", "qa_silver_row_count == bronze_count")&lt;/P&gt;&lt;P&gt;@dlt.table()&lt;/P&gt;&lt;P&gt;def silver():&lt;/P&gt;&lt;P&gt;   bronze_count == bronze_table.count() &lt;/P&gt;&lt;P&gt;   silver_table = # transformations on bronze table&lt;/P&gt;&lt;P&gt;   silver_table = silver_table.withColumn("qa_silver_row_count", F.lit(silver_table.count()))&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The way I manage to make it run is:&lt;/P&gt;&lt;P&gt;@dlt.expect_or_fail("equal_number_of_records", "qa_silver_row_count == qa_bronze_row_count")&lt;/P&gt;&lt;P&gt;@dlt.table()&lt;/P&gt;&lt;P&gt;def silver():&lt;/P&gt;&lt;P&gt;   bronze_table = #load bronze table&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; silver_table = upstream_table.withColumn("qa_bronze_row_count",       F.lit(bronze_table..count()))&lt;/P&gt;&lt;P&gt;   silver_table = # transformations on bronze table&lt;/P&gt;&lt;P&gt;   silver_table = silver_table.withColumn("qa_silver_row_count", F.lit(silver_table.count()))&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;It is a little bit cumbersome. It is a little annoying that DLT actually makes row count automatically but it can't be easily accessed. Maybe it is possible to get this data from the &lt;A href="https://docs.databricks.com/workflows/delta-live-tables/delta-live-tables-event-log.html" alt="https://docs.databricks.com/workflows/delta-live-tables/delta-live-tables-event-log.html" target="_blank"&gt;event log table&lt;/A&gt; I will try to find out&lt;/P&gt;</description>
      <pubDate>Tue, 11 Oct 2022 14:09:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-tables-expect-compare-tables-count-between-two-stages/m-p/28224#M20047</guid>
      <dc:creator>140015</dc:creator>
      <dc:date>2022-10-11T14:09:42Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Live Tables @expect compare tables count between two stages</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-tables-expect-compare-tables-count-between-two-stages/m-p/28225#M20048</link>
      <description>&lt;P&gt;I dig into DLT docs and found this &lt;A href="https://docs.databricks.com/workflows/delta-live-tables/delta-live-tables-cookbook.html#validate-row-counts-across-tables" target="test_blank"&gt;https://docs.databricks.com/workflows/delta-live-tables/delta-live-tables-cookbook.html#validate-row-counts-across-tables&lt;/A&gt; . I guess it solves my problem.&lt;/P&gt;</description>
      <pubDate>Wed, 12 Oct 2022 11:29:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-tables-expect-compare-tables-count-between-two-stages/m-p/28225#M20048</guid>
      <dc:creator>140015</dc:creator>
      <dc:date>2022-10-12T11:29:18Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Live Tables @expect compare tables count between two stages</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-tables-expect-compare-tables-count-between-two-stages/m-p/28226#M20049</link>
      <description>&lt;P&gt;great find!  &lt;/P&gt;</description>
      <pubDate>Wed, 12 Oct 2022 11:39:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-tables-expect-compare-tables-count-between-two-stages/m-p/28226#M20049</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-10-12T11:39:37Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Live Tables @expect compare tables count between two stages</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-tables-expect-compare-tables-count-between-two-stages/m-p/28227#M20050</link>
      <description>&lt;P&gt;The problem is that expectations are deterministic, so SQL query in "expect" needs always to give the same result (for that reason, it is not possible to add that timestamp is from the last 60 minutes in "expect", but the timestamp may be greater than the hardcoded date). That's why it is proposed to create an additional table for checks instead of using "expect".&lt;/P&gt;</description>
      <pubDate>Sun, 16 Oct 2022 16:28:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-tables-expect-compare-tables-count-between-two-stages/m-p/28227#M20050</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2022-10-16T16:28:45Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Live Tables @expect compare tables count between two stages</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-live-tables-expect-compare-tables-count-between-two-stages/m-p/28228#M20051</link>
      <description>&lt;P&gt;Hi @Jacek Dembowiak​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We'd love to hear from you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 17 Nov 2022 06:15:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-live-tables-expect-compare-tables-count-between-two-stages/m-p/28228#M20051</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-11-17T06:15:06Z</dc:date>
    </item>
  </channel>
</rss>

