The Hardest Part of Our SAP Migration Wasn't the Data. It Was Timing

Community Articles

Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practices. Join the conversation today and unlock a wealth of collective wisdom to enhance your experience and drive success.

Part 2 of my series on building an enterprise data platform on Databricks — this one's about Silver.

Part 1 covered why we ran two ingestion paths in parallel (GoldenGate CDC + JDBC batch) and kept them as separate bronze tables. If you missed it:
https://medium.com/@savlahanish/why-we-used-two-bronze-tables-instead-of-one-and-why-it-mattered-9c4...

Part 2 is where it got harder.

When both Bronze tables exist simultaneously, you inevitably end up with the same logical record in two places — captured differently, timestamped differently, and neither timestamp is fully reliable on its own.

Three things this covers that most CDC tutorials don't:

→ The 5-minute overlap window where _ingest_time alone gives you the wrong answer - and the tiebreaker we added to fix it

→ How CDC DELETE events silently keep deleted SAP records alive in Silver if you don't handle them explicitly in your MERGE statement

→ The natural key mistake we made on one table - only caught when a business analyst noticed transaction counts in Silver didn't match SAP

Full post: https://medium.com/@savlahanish/the-hardest-part-of-our-sap-migration-wasnt-the-data-it-was-timing-e...

Has anyone else hit timing issues during the initial load window on a similar migration?
Curious how others handled the overlap period between snapshot and streaming.