cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Applying SCD in DLT using 3 different tables at source

nag_kanchan
New Contributor III

My organization has recently started using Delta Live Tables in Databricks for data modeling. One of the dimensions I am trying to model takes data from 3 existing tables in the data lake and needs to be slowly changing dimensions (SCD Type 1).

This appears to be quite simple if you have just one table, according to the Databricks documentation (https://docs.databricks.com/en/delta-live-tables/cdc.html). I'm struggling when trying to make the final dimension a composite of 3 source tables.

For example, let's say the tables are: person, personDetail, and job.

person:

PersonID Name
1Name1
2Name2

personDetail:

PersonID JobID detail
11detail1
22detail2

job:

jobID jobName
1job1
2job2

The final dimension needs to have personID, detail, and jobName, and show any changes to any of the 3 tables.

Other than making a complicated union in Pyspark at the beginning of the code, I can't think of a way to make this work, perhaps creating each table as a separate streaming table and then joining them. Any suggestions on how this can be handled in a simplified way?

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group