a week ago
Hi,
Is there a way to get number of warned records, dropped records , failed records for each expectation I see currently it gives aggregated count
a week ago
Hi @IM_01,
You canโt change the UI to break out those numbers, but you can get per-expectation counts from the DLT (Lakeflow) event log. Each expectation entry records passed_records and failed_records; for EXPECT rules failed_records = warned rows, and for EXPECT โฆ DROP ROW rules failed_records = dropped rows. Expectations configured with FAIL UPDATE donโt emit aggregate metrics.
Here is a sample query you can run. Just replace the DLT table name where it says my_dlt_table
WITH exploded AS (
SELECT
timestamp,
explode(
from_json(
details:flow_progress:data_quality:expectations,
'array<struct<name:string,dataset:string,passed_records:long,failed_records:long>>'
)
) AS e
FROM event_log(TABLE(my_dlt_table))
WHERE details:flow_progress:data_quality IS NOT NULL
)
SELECT
timestamp,
e.name AS expectation_name,
e.dataset,
e.passed_records,
e.failed_records
FROM exploded
ORDER BY timestamp DESC, expectation_name;
I tested it for a sample table and it returned the split. I'm guessing this is what you want to see?
You can also take a look at the documentation here for exploring data quality / expectations metrics from the event log.
Hope this helps.
If this answer resolves your question, could you mark it as โAccept as Solutionโ? That helps other users quickly find the correct fix.
Tuesday
Hi @Ashwin_DSA
Apologies I was referring to event_log in this case
{"dropped_records":0,"warned_records":344,"expectations":[{"name":"valid_case1","dataset":"cat.sch.tb1","passed_records":2505,"failed_records":313},{"name":"valid_case2","dataset":"cat.sch.tb1","passed_records":2719,"failed_records":99}]}
So the warned_records gives aggregated count right and if fail is the action it just gives failed_records in expectations dictionary and no passed records
Thursday
Hi @Ashwin_DSA
even on passing table name to the event_log function it returns all the rows. could you please help with this
select * from event_log(table(workspace.default.customers_summary_mv))
Saturday
Hi @IM_01,
event_log(TABLE(...)) always returns the entire pipelineโs event log, not just rows for that one dataset. Passing a table is just a shortcut to find the owning pipeline. It doesnโt filter the log to that table.
To restrict to a specific table like customers_summary_mv, add an explicit filter on the flow name (and usually event type). Sample given below.
SELECT *
FROM event_log(TABLE(workspace.default.customers_summary_mv))
WHERE event_type = 'flow_progress'
AND origin.flow_name = 'customers_summary_mv';
If this answer resolves your question, could you mark it as โAccept as Solutionโ? That helps other users quickly find the correct fix.
Saturday
Hi @IM_01,
Warned_records / dropped_records (top-level): These are aggregated per-dataset counts of unique rows that were warned or dropped in that micro-batch/update. They are not a simple sum of failed_records across expectations, because the same row can fail multiple expectations. That row is counted once in warned_records but multiple times in expectations[*].failed_records.
Thatโs why in your example:
"warned_records": 344,
"expectations": [
{"name":"valid_case1", ... "failed_records":313},
{"name":"valid_case2", ... "failed_records":99}
]
344 โ 313 + 99 some rows likely violated both valid_case1 and valid_case2.
For fail expectations, the update aborts on the first violation, and data quality metrics are not recorded in the same way as warn/drop. Practically, you do not get meaningful passed_records/failed_records metrics for that expectation in details.flow_progress.data_quality. Instead, you get an expectation violation error event (with the expectation name and offending record) in the event log / error message, but no aggregate counts of how many rows would have failed. So, yes, warned_records is an aggregated, deduped count at the dataset level. No, a FAIL action does not behave like warn/drop in the metrics JSON. you generally wonโt see reliable passed_records/failed_records for it, just the failure event.
If this answer resolves your question, could you mark it as โAccept as Solutionโ? That helps other users quickly find the correct fix.
Sunday
Thanks Ashwin