cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Need an advice of someone with practical experience in DLT

Fatimah-Tariq
New Contributor II

Hi, I'm facing this scenario in my DLT pipeline where in my silver layer I'm doing some filtering to prevent my test data to go to silver schema and then in the end i'm using apply_changes to create the tables and I'm using sequence_by clause within that to keep the recently updated version of each record. 
Now, the issue is where in the recently updated version of a record, its test flag becomes true and hence it becomes a test record. Then with the logic, it gets filtered out by my code and by the time data reaches my sequence_by clause within apply_changes, the recently updated entry is already filtered out and since DLT does not support deletes so the prev version of record is considered the recent one and hence is getting forward to silver schema. Where, in reality, that version of record is outdated now and we do not need that in silver schema. 

in short, the outdated records are moving forward to silver schema because of filtering scenario.


Please suggest me what is the best approach to handle this situation? 

 

This is my silver layer's code structure:

@Dlt.view(name = bronze_dlt_view

 def bronze_source():

      //code to fetch tables from bronze and apply filtering

 

 dlt.create_streaming_table(
         name = silver_table,
         table_properties= table_props
         comment = "Silver table with MERGE into logic from bronze"
     )
 
dlt.apply_changes(
       target = silver_table,
       source = bronze_dlt_view,
       keys = primary_keys,
       sequence_by = col(sequence_col),
       stored_as_scd_type = 1,
   )
1 REPLY 1

Alberto_Umana
Databricks Employee
Databricks Employee

To address the issue of outdated records moving forward to the silver schema in your Delta Live Tables (DLT) pipeline, you can consider the following approach

Modify the Filtering Logic: Instead of filtering out the test records before the apply_changes function, you can handle the filtering within the apply_changes function itself. This way, the sequence of records is maintained correctly, and the outdated records are not propagated forward

Use the apply_as_deletes Parameter: You can use the apply_as_deletes parameter within the apply_changes function to mark records as deleted based on your test flag. This ensures that the records with the test flag set to true are treated as deletions and are not carried forward.

By handling the filtering within the apply_changes function, you ensure that the most recent version of each record is correctly processed and outdated records are not moved forward to the silver schema

Please see: https://docs.databricks.com/en/delta-live-tables/cdc.html

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group