Incorrect dropped rows count in DLT Event log

erigaud
Honored Contributor

Hello, 

I'm using a DLT pipeline with expectations

expect_or_drop(...)

 To test it, I added files that contain records that should be dropped, and indeed when running the pipeline I can see some rows were dropped.

However when looking at the DLT Event logs, in the field 

details.flow_progress.data_quality.dropped_records

I always get 0, and the number of failing rows are only available in the field 

details.flow_progress.data_quality.expectations.failed_records

Is this the expected behavior ? 

Priyanka_Biswas
Databricks Employee
Databricks Employee

Hello @erigaud 

The issue appears to be related to the details.flow_progress.data_quality.dropped_records field always being 0, despite records being dropped. This might be because the expect_or_drop operator isn't updating the dropped_records field in DLT Event logs, instead updating the failed_records field in the details.flow_progress.data_quality.expectations section. To confirm, check the DLT Event logs for the failed_records field. If it's updating correctly, the issue likely lies with the dropped_records field. To resolve this, try using the expect_all_or_drop operator, which should correctly update the dropped_records field. The code modification would look like this:


@dlt.table
@dlt.expect_all_or_drop({"valid_count": "count > 0", "valid_current_page": "current_page_id IS NOT NULL AND current_page_title IS NOT NULL"})
def raw_data():
# Create raw dataset

 

Tried updating the code to expect_all_drop, but it hasn't changed anything : the dropped records column is still 0. In which cases is this column filled ? 

erigaud
Honored Contributor

Hello @Retired_mod 

I understand that dropped records and failed records are tracked separately, but my complaint is that the column for dropped records in my case is always 0, even though I have expectations that drop rows. 

For example, I ran a workflow that dropped 5 rows due to an expect_or_drop expectation, and when I checked the event log I only saw 5 failed records and 0 dropped records. Is that normal or is it a bug ? It seems like dropped records aren't tracked properly.