topic Re: Lakeflow SDP expectations in Data Engineering

Lakeflow SDP expectations

IM_01 — Mon, 06 Apr 2026 18:25:44 GMT

Hi,

Is there a way to get number of warned records, dropped records , failed records for each expectation I see currently it gives aggregated count

Re: Lakeflow SDP expectations

Ashwin_DSA — Mon, 06 Apr 2026 19:05:16 GMT

Hi @IM_01,

You can’t change the UI to break out those numbers, but you can get per-expectation counts from the DLT (Lakeflow) event log. Each expectation entry records passed_records and failed_records; for EXPECT rules failed_records = warned rows, and for EXPECT … DROP ROW rules failed_records = dropped rows. Expectations configured with FAIL UPDATE don’t emit aggregate metrics.

Here is a sample query you can run. Just replace the DLT table name where it says my_dlt_table

WITH exploded AS ( SELECT timestamp, explode( from_json( details:flow_progress:data_quality:expectations, 'array<struct<name:string,dataset:string,passed_records:long,failed_records:long>>' ) ) AS e FROM event_log(TABLE(my_dlt_table)) WHERE details:flow_progress:data_quality IS NOT NULL ) SELECT timestamp, e.name AS expectation_name, e.dataset, e.passed_records, e.failed_records FROM exploded ORDER BY timestamp DESC, expectation_name;

I tested it for a sample table and it returned the split. I'm guessing this is what you want to see?

You can also take a look at the documentation here for exploring data quality / expectations metrics from the event log.

Hope this helps.

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Re: Lakeflow SDP expectations

IM_01 — Tue, 07 Apr 2026 19:25:49 GMT

Hi @Ashwin_DSA
Apologies I was referring to event_log in this case

{"dropped_records":0,"warned_records":344,"expectations":[{"name":"valid_case1","dataset":"cat.sch.tb1","passed_records":2505,"failed_records":313},{"name":"valid_case2","dataset":"cat.sch.tb1","passed_records":2719,"failed_records":99}]}

So the warned_records gives aggregated count right and if fail is the action it just gives failed_records in expectations dictionary and no passed records

Re: Lakeflow SDP expectations

IM_01 — Thu, 09 Apr 2026 08:44:30 GMT

Hi @Ashwin_DSA
even on passing table name to the event_log function it returns all the rows. could you please help with this

select * from event_log(table(workspace.default.customers_summary_mv))

Re: Lakeflow SDP expectations

Ashwin_DSA — Sat, 11 Apr 2026 21:02:59 GMT

Hi @IM_01,

Warned_records / dropped_records (top-level): These are aggregated per-dataset counts of unique rows that were warned or dropped in that micro-batch/update. They are not a simple sum of failed_records across expectations, because the same row can fail multiple expectations. That row is counted once in warned_records but multiple times in expectations[*].failed_records.

That’s why in your example:

"warned_records": 344,
"expectations": [
  {"name":"valid_case1", ... "failed_records":313},
  {"name":"valid_case2", ... "failed_records":99}
]

344 ≠ 313 + 99 some rows likely violated both valid_case1 and valid_case2.

For fail expectations, the update aborts on the first violation, and data quality metrics are not recorded in the same way as warn/drop. Practically, you do not get meaningful passed_records/failed_records metrics for that expectation in details.flow_progress.data_quality. Instead, you get an expectation violation error event (with the expectation name and offending record) in the event log / error message, but no aggregate counts of how many rows would have failed. So, yes, warned_records is an aggregated, deduped count at the dataset level. No, a FAIL action does not behave like warn/drop in the metrics JSON. you generally won’t see reliable passed_records/failed_records for it, just the failure event.

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Re: Lakeflow SDP expectations

Ashwin_DSA — Sat, 11 Apr 2026 21:05:19 GMT

Hi @IM_01,

event_log(TABLE(...)) always returns the entire pipeline’s event log, not just rows for that one dataset. Passing a table is just a shortcut to find the owning pipeline. It doesn’t filter the log to that table.

To restrict to a specific table like customers_summary_mv, add an explicit filter on the flow name (and usually event type). Sample given below.

SELECT *
FROM event_log(TABLE(workspace.default.customers_summary_mv))
WHERE event_type = 'flow_progress'
  AND origin.flow_name = 'customers_summary_mv';

What are you looking for? Expectation metrics for that table?

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Re: Lakeflow SDP expectations

IM_01 — Sun, 12 Apr 2026 18:43:13 GMT

Thanks Ashwin