Part 2 of my series on building an enterprise data platform on Databricks โ this one's about Silver.
Part 1 covered why we ran two ingestion paths in parallel (GoldenGate CDC + JDBC batch) and kept them as separate bronze tables. If you missed it:
https://medium.com/@savlahanish/why-we-used-two-bronze-tables-instead-of-one-and-why-it-mattered-9c4...
Part 2 is where it got harder.
When both Bronze tables exist simultaneously, you inevitably end up with the same logical record in two places โ captured differently, timestamped differently, and neither timestamp is fully reliable on its own.
Three things this covers that most CDC tutorials don't:
โ The 5-minute overlap window where _ingest_time alone gives you the wrong answer - and the tiebreaker we added to fix it
โ How CDC DELETE events silently keep deleted SAP records alive in Silver if you don't handle them explicitly in your MERGE statement
โ The natural key mistake we made on one table - only caught when a business analyst noticed transaction counts in Silver didn't match SAP
Full post: https://medium.com/@savlahanish/the-hardest-part-of-our-sap-migration-wasnt-the-data-it-was-timing-e...
Has anyone else hit timing issues during the initial load window on a similar migration?
Curious how others handled the overlap period between snapshot and streaming.