cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Applying SCD in DLT using 3 different tables at source

nag_kanchan
New Contributor III

My organization has recently started using Delta Live Tables in Databricks for data modeling. One of the dimensions I am trying to model takes data from 3 existing tables in the data lake and needs to be slowly changing dimensions (SCD Type 1).

This appears to be quite simple if you have just one table, according to the Databricks documentation (https://docs.databricks.com/en/delta-live-tables/cdc.html). I'm struggling when trying to make the final dimension a composite of 3 source tables.

For example, let's say the tables are: person, personDetail, and job.

person:

PersonID Name
1Name1
2Name2

personDetail:

PersonID JobID detail
11detail1
22detail2

job:

jobID jobName
1job1
2job2

The final dimension needs to have personID, detail, and jobName, and show any changes to any of the 3 tables.

Other than making a complicated union in Pyspark at the beginning of the code, I can't think of a way to make this work, perhaps creating each table as a separate streaming table and then joining them. Any suggestions on how this can be handled in a simplified way?

0 REPLIES 0
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.