My organization has recently started using Delta Live Tables in Databricks for data modeling. One of the dimensions I am trying to model takes data from 3 existing tables in the data lake and needs to be slowly changing dimensions (SCD Type 1).
This appears to be quite simple if you have just one table, according to the Databricks documentation (https://docs.databricks.com/en/delta-live-tables/cdc.html). I'm struggling when trying to make the final dimension a composite of 3 source tables.
For example, let's say the tables are: person, personDetail, and job.
person:
personDetail:
job:
The final dimension needs to have personID, detail, and jobName, and show any changes to any of the 3 tables.
Other than making a complicated union in Pyspark at the beginning of the code, I can't think of a way to make this work, perhaps creating each table as a separate streaming table and then joining them. Any suggestions on how this can be handled in a simplified way?