Databricks Community

param_sen · ‎02-06-2024

As per databricks https://www.databricks.com/glossary/medallion-architecture silver layers typically represent the "enterprise view" with improved quality than bronze (cleansed, deduplicated , augmented ) and mostly has 3NF like normalised data .

The question is, do we really need to make it normalised if my raw data is flat? This will make room for more expensive joins when creating gold tables or materialized views in the gold layer. Analytics is one of the main consumers, and eventually there will be APIs that will serve the data in our use case.Any help will be greatly appreciated ?

Palash01 · ‎02-06-2024

Hey @param_sen

Given your concerns about expensive joins and prioritizing analytics with flat raw data, here are some suggestions:

Analyze the most common queries and reports you anticipate. Do they heavily rely on joins across dimensions? If not, the performance impact of normalization might be minimal.
Prioritize denormalization for key joins: Strategically denormalize specific dimensions frequently joined with fact tables in the Gold layer. This can balance flexibility with query performance.
Materialized views: Explore materialized views in the Gold layer to pre-join frequently accessed data, potentially mitigating some join costs.
Begin with a less normalized Silver layer, focusing on cleansing and enriching the data.You can always introduce normalization later if needed.

Leave a like if this helps! Kudos,
Palash

Databricks Community

What is the best practice for data model in silver layer in lakehouse

Join Us as a Local Community Builder!

🚀 Weekly Delta (8 - 14 October): A Look Back at This Week’s Top Community Highlights

Databricks Community Champion - September 2025 - Nayanjyoti Sonowal

🌟 Community Sparks of the Week | September 26 – October 2 🌟

Solution Accelerator Series | #4 - Toxicity Detection for Gaming

Level Up with Databricks Specialist Sessions