Incremental load from two tables

garciargs
New Contributor III

Hi, 

I am looking to build a ETL process for a incremental load silver table.

This silver table, lets say "contracts_silver", is built by joining two bronze tables, "contracts_raw" and "customer".

contracts_silver

CONTRACT_IDSTATUSCUSTOMER_NAME
1SIGNEDPeter Smith
2SIGNEDJohn Smith

contracts_raw

IDSTATUSDATECUSTOMER_ID
1SIGNED2025-01-151
2SIGNED2025-01-152

customer_raw

IDNAMEDOB
1Peter Smith2025-01-15
2John Smith2025-01-15

The "contracts_raw" table will grow faster than "customer_raw".

Updates are supposed to be in batch. In any update, I can have inserts, updates or deletes on both raw tables.

Considering that at any given time I can have updates happening on both raw tables, or only one of then, is databricks capable auomatically detecting the need to update the "contracts_silver" table if:

  • Update 1: in contracts_raw, ID 1 is changed to STATUS cancelled
  • Update 2: in customer_raw, ID2, name is changed to John J. Smith

The goal is to always reprocess, but only rows that will eventually change in the silver table.