What is the best practice for data model in silver layer in lakehouse
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-06-2024 12:11 PM
As per databricks https://www.databricks.com/glossary/medallion-architecture silver layers typically represent the "enterprise view" with improved quality than bronze (cleansed, deduplicated , augmented ) and mostly has 3NF like normalised data .
The question is, do we really need to make it normalised if my raw data is flat? This will make room for more expensive joins when creating gold tables or materialized views in the gold layer. Analytics is one of the main consumers, and eventually there will be APIs that will serve the data in our use case.Any help will be greatly appreciated ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-06-2024 11:37 PM
Hey @param_sen
Given your concerns about expensive joins and prioritizing analytics with flat raw data, here are some suggestions:
- Analyze the most common queries and reports you anticipate. Do they heavily rely on joins across dimensions? If not, the performance impact of normalization might be minimal.
- Prioritize denormalization for key joins: Strategically denormalize specific dimensions frequently joined with fact tables in the Gold layer. This can balance flexibility with query performance.
- Materialized views: Explore materialized views in the Gold layer to pre-join frequently accessed data, potentially mitigating some join costs.
- Begin with a less normalized Silver layer, focusing on cleansing and enriching the data.You can always introduce normalization later if needed.
Palash

