Part 1 of a 5-part series on building an enterprise data platform on Databricks.
When migrating a large retail conglomerate's SAP HANA platform to Databricks, we needed both historical
completeness and near-real-time freshness from day one.
That requirement led to a dual ingestion architecture — Oracle GoldenGate → Kafka → Structured Streaming for
real-time CDC, and JDBC batch for historical load — with two separate Bronze tables feeding one Silver layer.
This post covers:
→ Why streaming-only and batch-only both failed us
→ The architectural reason we kept two Bronze tables instead of merging at ingestion
→ How we sequenced the two pipelines
→ The tradeoffs we accepted and what we'd do differently
Full post: https://medium.com/@savlahanish/why-we-used-two-bronze-tables-instead-of-one-and-why-it-mattered-9c4...
Would love feedback from anyone who's tackled a similar SAP or enterprise CDC migration on Databricks.