cancel
Showing results for 
Search instead for 
Did you mean: 
Community Articles
Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practices. Join the conversation today and unlock a wealth of collective wisdom to enhance your experience and drive success.
cancel
Showing results for 
Search instead for 
Did you mean: 

Gold Layer Design on Databricks — MERGE vs Overwrite, Partitioning, SCD Type 2 from SAP

savlahanish27
Databricks Partner

Part 3 of my series on building an enterprise data platform on Databricks is up - this one cover Gold layer design.

The short version: Gold isn't just aggregated Silver. Silver maps to your source system. Gold maps to the business questions your consumers are actually asking - and those two things are almost never the same shape.

What's in the post:

  • MERGE vs overwrites for Gold writes, and the threshold where we switched (40min overwrite runs on vendor_balance at ~10M rows)
  • Partitioning strategy for financial tables: BUKRS+GJAHR for period aggregates, BUKRS alone for balances, no partition on dimensions
  • Z-ordering on LIFNR+MONAT for finance report query patterns
  • SCD Type 2 from SAP master data using a validity window at Gold
  • What doesn't belong in Gold — and the two days we spent auditing a table we eventually deleted
  • Full vendor_balance Gold table in PySpark with MERGE pattern

This is Part 3 of 5. Parts 1 and 2 covered Bronze ingestion (GoldenGate + Kafka + Structured Streaming alongside JDBC historical load) and Silver reconciliation. Part 4 is about why three-layer medallion wasn't enough and what we added.

Full post: Designing the Gold Layer on Databricks — What Belongs and What Doesn’t 

Happy to answer questions on any of the decisions — there were a few where we went back and forth longer than we should have.

0 REPLIES 0