Gold Layer Design on Databricks — MERGE vs Overwrite, Partitioning, SCD Type 2 from SAP

savlahanish27 — Mon, 15 Jun 2026 11:29:39 GMT

Part 3 of my series on building an enterprise data platform on Databricks is up - this one cover Gold layer design.

The short version: Gold isn't just aggregated Silver. Silver maps to your source system. Gold maps to the business questions your consumers are actually asking - and those two things are almost never the same shape.

What's in the post:

MERGE vs overwrites for Gold writes, and the threshold where we switched (40min overwrite runs on vendor_balance at ~10M rows)
Partitioning strategy for financial tables: BUKRS+GJAHR for period aggregates, BUKRS alone for balances, no partition on dimensions
Z-ordering on LIFNR+MONAT for finance report query patterns
SCD Type 2 from SAP master data using a validity window at Gold
What doesn't belong in Gold — and the two days we spent auditing a table we eventually deleted
Full vendor_balance Gold table in PySpark with MERGE pattern

This is Part 3 of 5. Parts 1 and 2 covered Bronze ingestion (GoldenGate + Kafka + Structured Streaming alongside JDBC historical load) and Silver reconciliation. Part 4 is about why three-layer medallion wasn't enough and what we added.

Full post: Designing the Gold Layer on Databricks — What Belongs and What Doesn’t

Happy to answer questions on any of the decisions — there were a few where we went back and forth longer than we should have.

topic Gold Layer Design on Databricks — MERGE vs Overwrite, Partitioning, SCD Type 2 from SAP in Community Articles

Gold Layer Design on Databricks — MERGE vs Overwrite, Partitioning, SCD Type 2 from SAP