Databricks Community

Fatimah-Tariq · ‎12-10-2024

Hi, I'm facing this scenario in my DLT pipeline where in my silver layer I'm doing some filtering to prevent my test data to go to silver schema and then in the end i'm using apply_changes to create the tables and I'm using sequence_by clause within that to keep the recently updated version of each record.
Now, the issue is where in the recently updated version of a record, its test flag becomes true and hence it becomes a test record. Then with the logic, it gets filtered out by my code and by the time data reaches my sequence_by clause within apply_changes, the recently updated entry is already filtered out and since DLT does not support deletes so the prev version of record is considered the recent one and hence is getting forward to silver schema. Where, in reality, that version of record is outdated now and we do not need that in silver schema.

in short, the outdated records are moving forward to silver schema because of filtering scenario.

Please suggest me what is the best approach to handle this situation?

This is my silver layer's code structure:

@Dlt.view(name = bronze_dlt_view

def bronze_source():

//code to fetch tables from bronze and apply filtering

dlt.create_streaming_table(

name = silver_table,

table_properties= table_props

comment = "Silver table with MERGE into logic from bronze"

)

dlt.apply_changes(

target = silver_table,

source = bronze_dlt_view,

keys = primary_keys,

sequence_by = col(sequence_col),

stored_as_scd_type = 1,

)

Alberto_Umana · ‎12-10-2024

To address the issue of outdated records moving forward to the silver schema in your Delta Live Tables (DLT) pipeline, you can consider the following approach

Modify the Filtering Logic: Instead of filtering out the test records before the apply_changes function, you can handle the filtering within the apply_changes function itself. This way, the sequence of records is maintained correctly, and the outdated records are not propagated forward

Use the apply_as_deletes Parameter: You can use the apply_as_deletes parameter within the apply_changes function to mark records as deleted based on your test flag. This ensures that the records with the test flag set to true are treated as deletions and are not carried forward.

By handling the filtering within the apply_changes function, you ensure that the most recent version of each record is correctly processed and outdated records are not moved forward to the silver schema

Please see: https://docs.databricks.com/en/delta-live-tables/cdc.html

View solution in original post

Alberto_Umana · ‎12-10-2024

To address the issue of outdated records moving forward to the silver schema in your Delta Live Tables (DLT) pipeline, you can consider the following approach

Modify the Filtering Logic: Instead of filtering out the test records before the apply_changes function, you can handle the filtering within the apply_changes function itself. This way, the sequence of records is maintained correctly, and the outdated records are not propagated forward

Use the apply_as_deletes Parameter: You can use the apply_as_deletes parameter within the apply_changes function to mark records as deleted based on your test flag. This ensures that the records with the test flag set to true are treated as deletions and are not carried forward.

By handling the filtering within the apply_changes function, you ensure that the most recent version of each record is correctly processed and outdated records are not moved forward to the silver schema

Please see: https://docs.databricks.com/en/delta-live-tables/cdc.html