Why We Used Two Bronze Tables Instead of One — And... - Databricks Community

Community Articles

Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practices. Join the conversation today and unlock a wealth of collective wisdom to enhance your experience and drive success.

Part 1 of a 5-part series on building an enterprise data platform on Databricks.

When migrating a large retail conglomerate's SAP HANA platform to Databricks, we needed both historical
completeness and near-real-time freshness from day one.

That requirement led to a dual ingestion architecture — Oracle GoldenGate → Kafka → Structured Streaming for
real-time CDC, and JDBC batch for historical load — with two separate Bronze tables feeding one Silver layer.

This post covers:
→ Why streaming-only and batch-only both failed us
→ The architectural reason we kept two Bronze tables instead of merging at ingestion
→ How we sequenced the two pipelines
→ The tradeoffs we accepted and what we'd do differently

Full post: https://medium.com/@savlahanish/why-we-used-two-bronze-tables-instead-of-one-and-why-it-mattered-9c4...

Would love feedback from anyone who's tackled a similar SAP or enterprise CDC migration on Databricks.