โ08-01-2025 07:26 AM
Hi Databricks Experts,
I'm working on a Medallion Architecture implementation in Databricks, where the source data is coming from SQL Server. I would like some advice on how to handle the bronze layer correctly and cost-effectively.
Should I create a bronze schema in Unity Catalog and store the raw ingested tables as-is (in Delta format)?
Or is it better to simply save the raw data (as CSV or Parquet) in an ADLS container without registering Delta tables in Unity Catalog?
Additionally, from a cost perspective:
Is there any cost involved in just creating tables in Unity Catalog?
Will saving the raw data directly in ADLS (as Parquet or CSV) be more cost-efficient?
Which components incur Azure Storage vs. Databricks (DBU/compute) charges?
I'm looking to understand the proper architectural approach that balances cost and best practices for production use cases. Any suggestions or experiences would be greatly appreciated.
Thanks!
โ08-03-2025 10:49 PM
Thanks @lingareddy_Alva for your response, considering the cost effective and the medallion architecture, i need to create bronze tables in the unity catalog than saving the files directly to the ADLS container for the bronze layer.
Also considering the features of delta tables,
- Schema Evolution - Handle SQL Server schema changes gracefully
- ACID Transactions - Consistent data during ingestion
- Time Travel - Debug data issues and recover from bad loads
- Performance - Z-ordering and file compaction
- Governance - Unity Catalog lineage and access control
โ08-02-2025 02:52 PM
Let me break down the bronze layer approach for your Medallion Architecture with detailed cost and architectural considerations.
Tables in Unity Catalog:
Why This is Better Than Raw Files:
- Schema Evolution - Handle SQL Server schema changes gracefully
- ACID Transactions - Consistent data during ingestion
- Time Travel - Debug data issues and recover from bad loads
- Performance - Z-ordering and file compaction
- Governance - Unity Catalog lineage and access control
Cost Breakdown Analysis:
FREE Components:
- Creating schemas/catalogs
- Registering tables (metadata only)
- Unity Catalog metadata storage
- Lineage tracking
- Access control policies
๐ฐ PAID Components:
- DBU consumption during queries/operations
- Underlying storage (ADLS/S3)
- Compute for optimization operations
Detailed Cost Analysis
ADLS vs Databricks Charges:
ADLS Storage Costs:
- Standard tier: $0.0184/GB/month
- Cool tier: $0.01/GB/month (for archival bronze data)
- Transaction costs: ~$0.0004 per 10k operations
Databricks Costs:
- DBU consumption during:
* Data ingestion jobs
* Table optimization (OPTIMIZE, VACUUM)
* Query execution
* Cluster startup/shutdown
Cost Optimization Tips:
- Use spot instances: 60-70% savings
- Auto-terminate idle clusters
- Schedule optimization during off-peak hours
- Use smaller clusters for bronze layer
Final Recommendation
Use Unity Catalog Delta Tables for Bronze Layer because:
Cost-effective: Storage costs are similar to Parquet, but with better compression:
- Production-ready: ACID transactions, schema evolution, time travel
- Governance: Built-in lineage, access control, data discovery
- Performance: Better query performance for silver/gold transformations
- Maintenance: Automated optimization reduces long-term costs
The additional UC overhead is minimal compared to the operational benefits, especially in production environments where data quality and governance are crucial.
โ08-03-2025 10:49 PM
Thanks @lingareddy_Alva for your response, considering the cost effective and the medallion architecture, i need to create bronze tables in the unity catalog than saving the files directly to the ADLS container for the bronze layer.
Also considering the features of delta tables,
- Schema Evolution - Handle SQL Server schema changes gracefully
- ACID Transactions - Consistent data during ingestion
- Time Travel - Debug data issues and recover from bad loads
- Performance - Z-ordering and file compaction
- Governance - Unity Catalog lineage and access control
โ08-03-2025 11:34 PM
Create the bronze table using Auto Loader and store it in Delta format. Although it might seem like you'll only read from bronze once to populate the silver layer, in real-world production scenarios, you'll often need to re-read from bronzeโfor reprocessing, debugging, or new use cases. Using Delta ensures those repeated reads are performant and reliable, compared to reading directly from raw formats like JSON or CSV.
โ08-03-2025 11:43 PM
thanks @pgo
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now