cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Incorrect dropped rows count in DLT Event log

erigaud
Honored Contributor

Hello, 

I'm using a DLT pipeline with expectations

expect_or_drop(...)

 To test it, I added files that contain records that should be dropped, and indeed when running the pipeline I can see some rows were dropped.

However when looking at the DLT Event logs, in the field 

details.flow_progress.data_quality.dropped_records

I always get 0, and the number of failing rows are only available in the field 

details.flow_progress.data_quality.expectations.failed_records

Is this the expected behavior ? 

3 REPLIES 3

Priyanka_Biswas
Databricks Employee
Databricks Employee

Hello @erigaud 

The issue appears to be related to the details.flow_progress.data_quality.dropped_records field always being 0, despite records being dropped. This might be because the expect_or_drop operator isn't updating the dropped_records field in DLT Event logs, instead updating the failed_records field in the details.flow_progress.data_quality.expectations section. To confirm, check the DLT Event logs for the failed_records field. If it's updating correctly, the issue likely lies with the dropped_records field. To resolve this, try using the expect_all_or_drop operator, which should correctly update the dropped_records field. The code modification would look like this:


@dlt.table
@dlt.expect_all_or_drop({"valid_count": "count > 0", "valid_current_page": "current_page_id IS NOT NULL AND current_page_title IS NOT NULL"})
def raw_data():
# Create raw dataset

 

Tried updating the code to expect_all_drop, but it hasn't changed anything : the dropped records column is still 0. In which cases is this column filled ? 

erigaud
Honored Contributor

Hello @Retired_mod 

I understand that dropped records and failed records are tracked separately, but my complaint is that the column for dropped records in my case is always 0, even though I have expectations that drop rows. 

For example, I ran a workflow that dropped 5 rows due to an expect_or_drop expectation, and when I checked the event log I only saw 5 failed records and 0 dropped records. Is that normal or is it a bug ? It seems like dropped records aren't tracked properly.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group