cancel
Showing results for 
Search instead for 
Did you mean: 
Community Discussions
cancel
Showing results for 
Search instead for 
Did you mean: 

How to facilitate incremental updates to an SCD Type 1 table that uses SCD Type 2 source tables

mvmiller
New Contributor III

I have an SCD Type 1 delta table (target) for which I am trying to figure out how to facilitate insert, updates, and deletes.  This table is sourced by multiple delta tables, with an SCD Type 2 structure, which are joined together to create the target table.  The goal is to implement an incremental load, such that 1) only records in the target table that are impacted by changes to the source tables are updated or deleted, and 2) only new records from the source that are not currently in the target table are inserted.

Each of the SCD Type 2 source tables are updated independently of one another. So it's possible that no tables, one table, or several source tables may have incremented versions since the prior time a load to the target table occurred.

I am trying to think through an process for how to facilitate incremental updates to the target table that does not require a full reload of the table.  Thoughts and advice or much appreciated.

2 REPLIES 2

mvmiller
New Contributor III

Correction (I can't seem to edit or remove original post):

- "... trying to think through an process" --> *a* process

- "Thoughts and advice or much appreciated" --> Thoughts and/or advice are much appreciated.

Kaniz
Community Manager
Community Manager

Hi @mvmiller, Implementing incremental updates for your SCD Type 1 delta table can be achieved using some effective strategies.

 

Let’s explore a few approaches:

 

Delta Lake and Slowly Changing Dimensions (SCD):

  • Delta Lake, with its support for ACID transactions and schema enforcement, provides a robust foundation for handling SCDs.
  • SCDs represent attributes that change over time, and they fall into different categories (Type 1, Type 2, etc.).
  • For your SCD Type 1 delta table, where history is not retained, you can directly update records. No historical versions are maintained for updated records.
  • For SCD Type 2, which retains a history of records, you have two options:
    • Type 2a: Retain history for all updates.
    • Type 2b: Retain history only for updates to specific columns.
  • Delta Live Tables also supports updating tables with SCD Type 1 and Type 2 structures.

Incremental Updates Process:

  • Here’s a high-level process for incremental updates:
    1. Identify Changes in Source Tables:
      • Monitor each SCD Type 2 source table independently.
      • Detect changes (inserts, updates, deletes) since the last load to the target table.
    2. Delta Merge Operation:
      • Use the Delta Lake MERGE operation to efficiently apply changes to the target table.
      • The MERGE operation combines insert, update, and delete actions in a single transaction.
      • Example (using Databricks):# Convert target table to Delta deltaTable = DeltaTable.forName(spark, "target_scd1") # Merge Delta table with new dataset deltaTable.alias("original")  .merge(scd1Temp.alias("updates"), "original.employee_id = updates.employee_id")  .whenMatchedUpdateAll()  .whenNotMatchedInsertAll()  .execute()
    3. Load New Records:
      • Insert new records from the source tables that are not currently in the target table.
    4. Schedule Incremental Loads:
      • Set up a regular schedule to run the incremental updates process.
      • Consider using triggers (e.g., event-based triggers) to initiate updates when changes occur in the source tables.

Considerations:

  • Ensure proper data lineage and metadata management to track changes and maintain data accuracy.
  • Monitor performance and optimize the process as needed.
  • Test thoroughly to handle scenarios where multiple source tables have incremented versions.