- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-17-2025 11:59 AM
Hi,
I am looking to build a ETL process for a incremental load silver table.
This silver table, lets say "contracts_silver", is built by joining two bronze tables, "contracts_raw" and "customer".
contracts_silver
| CONTRACT_ID | STATUS | CUSTOMER_NAME |
| 1 | SIGNED | Peter Smith |
| 2 | SIGNED | John Smith |
contracts_raw
| ID | STATUS | DATE | CUSTOMER_ID |
| 1 | SIGNED | 2025-01-15 | 1 |
| 2 | SIGNED | 2025-01-15 | 2 |
customer_raw
| ID | NAME | DOB |
| 1 | Peter Smith | 2025-01-15 |
| 2 | John Smith | 2025-01-15 |
The "contracts_raw" table will grow faster than "customer_raw".
Updates are supposed to be in batch. In any update, I can have inserts, updates or deletes on both raw tables.
Considering that at any given time I can have updates happening on both raw tables, or only one of then, is databricks capable auomatically detecting the need to update the "contracts_silver" table if:
- Update 1: in contracts_raw, ID 1 is changed to STATUS cancelled
- Update 2: in customer_raw, ID2, name is changed to John J. Smith
The goal is to always reprocess, but only rows that will eventually change in the silver table.
- Labels:
-
Workflows