<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Lakeflow SDP expectations in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/lakeflow-sdp-expectations/m-p/154146#M54071</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/193958"&gt;@IM_01&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;event_log(TABLE(...)) always returns the entire pipeline’s event log, not just rows for that one dataset. Passing a table is just a shortcut to find the owning pipeline. It doesn’t filter the log to that table.&lt;/P&gt;
&lt;P&gt;To restrict to a specific table like customers_summary_mv, add an explicit filter on the flow name (and usually event type). Sample given below.&lt;/P&gt;
&lt;DIV class="l8rrz21 _1ibi0s3do" data-ui-element="code-block-container"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-sql p8i6j0e hljs language-sql _12n1b832"&gt;&lt;SPAN class="hljs-keyword"&gt;SELECT&lt;/SPAN&gt; &lt;SPAN class="hljs-operator"&gt;*&lt;/SPAN&gt;
&lt;SPAN class="hljs-keyword"&gt;FROM&lt;/SPAN&gt; event_log(&lt;SPAN class="hljs-keyword"&gt;TABLE&lt;/SPAN&gt;(workspace.default.customers_summary_mv))
&lt;SPAN class="hljs-keyword"&gt;WHERE&lt;/SPAN&gt; event_type &lt;SPAN class="hljs-operator"&gt;=&lt;/SPAN&gt; &lt;SPAN class="hljs-string"&gt;'flow_progress'&lt;/SPAN&gt;
  &lt;SPAN class="hljs-keyword"&gt;AND&lt;/SPAN&gt; origin.flow_name &lt;SPAN class="hljs-operator"&gt;=&lt;/SPAN&gt; &lt;SPAN class="hljs-string"&gt;'customers_summary_mv'&lt;/SPAN&gt;;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;DIV class="l8rrz23 _1ibi0s3d7 _1ibi0s332 _1ibi0s3dp _1ibi0s3bm _1ibi0s3ce"&gt;
&lt;DIV class="l8rrz25 _1ibi0s3dc"&gt;What are you looking for?&amp;nbsp; E&lt;SPAN&gt;xpectation metrics for that table?&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV class="l8rrz25 _1ibi0s3dc"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="l8rrz25 _1ibi0s3dc"&gt;
&lt;P class="p1"&gt;&lt;FONT size="2" color="#FF6600"&gt;&lt;STRONG&gt;&lt;I&gt;If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.&lt;/I&gt;&lt;/STRONG&gt;&lt;/FONT&gt;&lt;I&gt;&lt;/I&gt;&lt;/P&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;</description>
    <pubDate>Sat, 11 Apr 2026 21:05:19 GMT</pubDate>
    <dc:creator>Ashwin_DSA</dc:creator>
    <dc:date>2026-04-11T21:05:19Z</dc:date>
    <item>
      <title>Lakeflow SDP expectations</title>
      <link>https://community.databricks.com/t5/data-engineering/lakeflow-sdp-expectations/m-p/153553#M53970</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is there a way to get number of warned records, dropped records , failed records for each expectation I see currently it gives aggregated count&lt;/P&gt;</description>
      <pubDate>Mon, 06 Apr 2026 18:25:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/lakeflow-sdp-expectations/m-p/153553#M53970</guid>
      <dc:creator>IM_01</dc:creator>
      <dc:date>2026-04-06T18:25:44Z</dc:date>
    </item>
    <item>
      <title>Re: Lakeflow SDP expectations</title>
      <link>https://community.databricks.com/t5/data-engineering/lakeflow-sdp-expectations/m-p/153555#M53971</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/193958"&gt;@IM_01&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;You can’t change the UI to break out those numbers, but you can get per-expectation counts from the DLT (Lakeflow) event log. Each expectation entry records passed_records and failed_records; for EXPECT rules failed_records = warned rows, and for EXPECT … DROP ROW rules failed_records = dropped rows. Expectations configured with FAIL UPDATE don’t emit aggregate metrics.&lt;/P&gt;
&lt;P&gt;Here is a sample query you can run. Just replace the DLT table name where it says my_dlt_table&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;WITH exploded AS (
  SELECT
    timestamp,
    explode(
      from_json(
        details:flow_progress:data_quality:expectations,
        'array&amp;lt;struct&amp;lt;name:string,dataset:string,passed_records:long,failed_records:long&amp;gt;&amp;gt;'
      )
    ) AS e
  FROM event_log(TABLE(my_dlt_table))
  WHERE details:flow_progress:data_quality IS NOT NULL
)
SELECT
  timestamp,
  e.name           AS expectation_name,
  e.dataset,
  e.passed_records,
  e.failed_records
FROM exploded
ORDER BY timestamp DESC, expectation_name;&lt;/LI-CODE&gt;
&lt;P&gt;I tested it for a sample table and it returned the split. I'm guessing this is what you want to see?&lt;/P&gt;
&lt;DIV id="tinyMceEditor_168935259e0521Ashwin_DSA_0" class="mceNonEditable lia-copypaste-placeholder"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="DLT_Expectations.png" style="width: 999px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/25717i986B9C218DF6BC3E/image-size/large?v=v2&amp;amp;px=999" role="button" title="DLT_Expectations.png" alt="DLT_Expectations.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;You can also take a look at the documentation &lt;/SPAN&gt;&lt;A style="font-family: inherit; background-color: #ffffff;" href="https://docs.databricks.com/aws/en/ldp/monitor-event-logs#data-quality-metrics" target="_blank"&gt;here&lt;/A&gt;&lt;SPAN&gt; for exploring&amp;nbsp;data quality / expectations metrics from the event log.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;Hope this helps.&lt;/P&gt;
&lt;P class="p1"&gt;&lt;FONT size="2" color="#FF6600"&gt;&lt;STRONG&gt;&lt;I&gt;If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.&lt;/I&gt;&lt;/STRONG&gt;&lt;/FONT&gt;&lt;I&gt;&lt;/I&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 06 Apr 2026 19:05:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/lakeflow-sdp-expectations/m-p/153555#M53971</guid>
      <dc:creator>Ashwin_DSA</dc:creator>
      <dc:date>2026-04-06T19:05:16Z</dc:date>
    </item>
    <item>
      <title>Re: Lakeflow SDP expectations</title>
      <link>https://community.databricks.com/t5/data-engineering/lakeflow-sdp-expectations/m-p/153651#M53987</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/216690"&gt;@Ashwin_DSA&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;Apologies I was referring to event_log in this case&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;{"dropped_records":0,"warned_records":344,"expectations":[{"name":"valid_case1","dataset":"cat.sch.tb1","passed_records":2505,"failed_records":313},{"name":"valid_case2","dataset":"cat.sch.tb1","passed_records":2719,"failed_records":99}]}&lt;/LI-CODE&gt;&lt;P&gt;&lt;BR /&gt;So the warned_records gives aggregated count right and if fail is the action it just gives failed_records in expectations&amp;nbsp; dictionary and no passed records&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 07 Apr 2026 19:25:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/lakeflow-sdp-expectations/m-p/153651#M53987</guid>
      <dc:creator>IM_01</dc:creator>
      <dc:date>2026-04-07T19:25:49Z</dc:date>
    </item>
    <item>
      <title>Re: Lakeflow SDP expectations</title>
      <link>https://community.databricks.com/t5/data-engineering/lakeflow-sdp-expectations/m-p/153843#M54024</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/216690"&gt;@Ashwin_DSA&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;even on passing table name to the event_log function it returns all the rows. could you please help with this&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;select * from  event_log(table(workspace.default.customers_summary_mv))&lt;/LI-CODE&gt;</description>
      <pubDate>Thu, 09 Apr 2026 08:44:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/lakeflow-sdp-expectations/m-p/153843#M54024</guid>
      <dc:creator>IM_01</dc:creator>
      <dc:date>2026-04-09T08:44:30Z</dc:date>
    </item>
    <item>
      <title>Re: Lakeflow SDP expectations</title>
      <link>https://community.databricks.com/t5/data-engineering/lakeflow-sdp-expectations/m-p/154145#M54070</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/193958"&gt;@IM_01&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Warned_records / dropped_records (top-level):&lt;/STRONG&gt;&amp;nbsp;These are aggregated per-dataset counts of unique rows that were warned or dropped in that micro-batch/update.&amp;nbsp;They are not a simple sum of failed_records across expectations, because the same row can fail multiple expectations. That row is counted once in warned_records but multiple times in expectations[*].failed_records.&lt;/P&gt;
&lt;P&gt;That’s why in your example:&lt;/P&gt;
&lt;DIV data-ui-element="code-block-container"&gt;
&lt;PRE&gt;"warned_records": 344,
"expectations": [
  {"name":"valid_case1", ... "failed_records":313},
  {"name":"valid_case2", ... "failed_records":99}
]&lt;/PRE&gt;
&lt;DIV&gt;
&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P&gt;344 ≠ 313 + 99&amp;nbsp; some rows likely violated both valid_case1 and valid_case2.&lt;/P&gt;
&lt;P&gt;For fail expectations, the update aborts on the first violation, and data quality metrics are not recorded in the same way as warn/drop.&amp;nbsp;Practically, you do not get meaningful passed_records/failed_records metrics for that expectation in details.flow_progress.data_quality. Instead, you get an expectation violation error event (with the expectation name and offending record) in the event log / error message, but no aggregate counts of how many rows would have failed. So, yes, warned_records is an aggregated, deduped count at the dataset level. No, a FAIL action does not behave like warn/drop in the metrics JSON. you generally won’t see reliable passed_records/failed_records for it, just the failure event.&lt;/P&gt;
&lt;P class="p1"&gt;&lt;FONT size="2" color="#FF6600"&gt;&lt;STRONG&gt;&lt;I&gt;If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.&lt;/I&gt;&lt;/STRONG&gt;&lt;/FONT&gt;&lt;I&gt;&lt;/I&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 11 Apr 2026 21:02:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/lakeflow-sdp-expectations/m-p/154145#M54070</guid>
      <dc:creator>Ashwin_DSA</dc:creator>
      <dc:date>2026-04-11T21:02:59Z</dc:date>
    </item>
    <item>
      <title>Re: Lakeflow SDP expectations</title>
      <link>https://community.databricks.com/t5/data-engineering/lakeflow-sdp-expectations/m-p/154146#M54071</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/193958"&gt;@IM_01&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;event_log(TABLE(...)) always returns the entire pipeline’s event log, not just rows for that one dataset. Passing a table is just a shortcut to find the owning pipeline. It doesn’t filter the log to that table.&lt;/P&gt;
&lt;P&gt;To restrict to a specific table like customers_summary_mv, add an explicit filter on the flow name (and usually event type). Sample given below.&lt;/P&gt;
&lt;DIV class="l8rrz21 _1ibi0s3do" data-ui-element="code-block-container"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-sql p8i6j0e hljs language-sql _12n1b832"&gt;&lt;SPAN class="hljs-keyword"&gt;SELECT&lt;/SPAN&gt; &lt;SPAN class="hljs-operator"&gt;*&lt;/SPAN&gt;
&lt;SPAN class="hljs-keyword"&gt;FROM&lt;/SPAN&gt; event_log(&lt;SPAN class="hljs-keyword"&gt;TABLE&lt;/SPAN&gt;(workspace.default.customers_summary_mv))
&lt;SPAN class="hljs-keyword"&gt;WHERE&lt;/SPAN&gt; event_type &lt;SPAN class="hljs-operator"&gt;=&lt;/SPAN&gt; &lt;SPAN class="hljs-string"&gt;'flow_progress'&lt;/SPAN&gt;
  &lt;SPAN class="hljs-keyword"&gt;AND&lt;/SPAN&gt; origin.flow_name &lt;SPAN class="hljs-operator"&gt;=&lt;/SPAN&gt; &lt;SPAN class="hljs-string"&gt;'customers_summary_mv'&lt;/SPAN&gt;;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;DIV class="l8rrz23 _1ibi0s3d7 _1ibi0s332 _1ibi0s3dp _1ibi0s3bm _1ibi0s3ce"&gt;
&lt;DIV class="l8rrz25 _1ibi0s3dc"&gt;What are you looking for?&amp;nbsp; E&lt;SPAN&gt;xpectation metrics for that table?&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV class="l8rrz25 _1ibi0s3dc"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="l8rrz25 _1ibi0s3dc"&gt;
&lt;P class="p1"&gt;&lt;FONT size="2" color="#FF6600"&gt;&lt;STRONG&gt;&lt;I&gt;If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.&lt;/I&gt;&lt;/STRONG&gt;&lt;/FONT&gt;&lt;I&gt;&lt;/I&gt;&lt;/P&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;</description>
      <pubDate>Sat, 11 Apr 2026 21:05:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/lakeflow-sdp-expectations/m-p/154146#M54071</guid>
      <dc:creator>Ashwin_DSA</dc:creator>
      <dc:date>2026-04-11T21:05:19Z</dc:date>
    </item>
    <item>
      <title>Re: Lakeflow SDP expectations</title>
      <link>https://community.databricks.com/t5/data-engineering/lakeflow-sdp-expectations/m-p/154224#M54073</link>
      <description>&lt;P&gt;Thanks Ashwin&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 12 Apr 2026 18:43:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/lakeflow-sdp-expectations/m-p/154224#M54073</guid>
      <dc:creator>IM_01</dc:creator>
      <dc:date>2026-04-12T18:43:13Z</dc:date>
    </item>
  </channel>
</rss>

