Applying SCD in DLT using 3 different tables at source
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-28-2023 12:59 AM
My organization has recently started using Delta Live Tables in Databricks for data modeling. One of the dimensions I am trying to model takes data from 3 existing tables in the data lake and needs to be slowly changing dimensions (SCD Type 1).
This appears to be quite simple if you have just one table, according to the Databricks documentation (https://docs.databricks.com/en/delta-live-tables/cdc.html). I'm struggling when trying to make the final dimension a composite of 3 source tables.
For example, let's say the tables are: person, personDetail, and job.
person:
1 | Name1 |
2 | Name2 |
personDetail:
1 | 1 | detail1 |
2 | 2 | detail2 |
job:
1 | job1 |
2 | job2 |
The final dimension needs to have personID, detail, and jobName, and show any changes to any of the 3 tables.
Other than making a complicated union in Pyspark at the beginning of the code, I can't think of a way to make this work, perhaps creating each table as a separate streaming table and then joining them. Any suggestions on how this can be handled in a simplified way?

