cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Applying SCD in DLT using 3 different tables at source

nag_kanchan
New Contributor III

My organization has recently started using Delta Live Tables in Databricks for data modeling. One of the dimensions I am trying to model takes data from 3 existing tables in the data lake and needs to be slowly changing dimensions (SCD Type 1).

This appears to be quite simple if you have just one table, according to the Databricks documentation (https://docs.databricks.com/en/delta-live-tables/cdc.html). I'm struggling when trying to make the final dimension a composite of 3 source tables.

For example, let's say the tables are: person, personDetail, and job.

person:

PersonID Name
1Name1
2Name2

personDetail:

PersonID JobID detail
11detail1
22detail2

job:

jobID jobName
1job1
2job2

The final dimension needs to have personID, detail, and jobName, and show any changes to any of the 3 tables.

Other than making a complicated union in Pyspark at the beginning of the code, I can't think of a way to make this work, perhaps creating each table as a separate streaming table and then joining them. Any suggestions on how this can be handled in a simplified way?

0 REPLIES 0

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now