topic Re: Best Practices for Designing Bronze Layer with SQL Server Source in Medallion Architecture in Data Engineering

Best Practices for Designing Bronze Layer with SQL Server Source in Medallion Architecture

Mohan_Baabu1 — Fri, 01 Aug 2025 14:26:37 GMT

Hi Databricks Experts,

I'm working on a Medallion Architecture implementation in Databricks, where the source data is coming from SQL Server. I would like some advice on how to handle the bronze layer correctly and cost-effectively.

Should I create a bronze schema in Unity Catalog and store the raw ingested tables as-is (in Delta format)?

Or is it better to simply save the raw data (as CSV or Parquet) in an ADLS container without registering Delta tables in Unity Catalog?

Additionally, from a cost perspective:

Is there any cost involved in just creating tables in Unity Catalog?

Will saving the raw data directly in ADLS (as Parquet or CSV) be more cost-efficient?

Which components incur Azure Storage vs. Databricks (DBU/compute) charges?

I'm looking to understand the proper architectural approach that balances cost and best practices for production use cases. Any suggestions or experiences would be greatly appreciated.

Thanks!

Re: Best Practices for Designing Bronze Layer with SQL Server Source in Medallion Architecture

lingareddy_Alva — Sat, 02 Aug 2025 21:52:47 GMT

Hi @Mohan_Baabu1

Let me break down the bronze layer approach for your Medallion Architecture with detailed cost and architectural considerations.

Tables in Unity Catalog:
Why This is Better Than Raw Files:
- Schema Evolution - Handle SQL Server schema changes gracefully
- ACID Transactions - Consistent data during ingestion
- Time Travel - Debug data issues and recover from bad loads
- Performance - Z-ordering and file compaction
- Governance - Unity Catalog lineage and access control

Cost Breakdown Analysis:
FREE Components:
- Creating schemas/catalogs
- Registering tables (metadata only)
- Unity Catalog metadata storage
- Lineage tracking
- Access control policies

💰 PAID Components:
- DBU consumption during queries/operations
- Underlying storage (ADLS/S3)
- Compute for optimization operations

Detailed Cost Analysis
ADLS vs Databricks Charges:
ADLS Storage Costs:
- Standard tier: $0.0184/GB/month
- Cool tier: $0.01/GB/month (for archival bronze data)
- Transaction costs: ~$0.0004 per 10k operations

Databricks Costs:
- DBU consumption during:
* Data ingestion jobs
* Table optimization (OPTIMIZE, VACUUM)
* Query execution
* Cluster startup/shutdown

Cost Optimization Tips:
- Use spot instances: 60-70% savings
- Auto-terminate idle clusters
- Schedule optimization during off-peak hours
- Use smaller clusters for bronze layer

Final Recommendation
Use Unity Catalog Delta Tables for Bronze Layer because:

Cost-effective: Storage costs are similar to Parquet, but with better compression:
- Production-ready: ACID transactions, schema evolution, time travel
- Governance: Built-in lineage, access control, data discovery
- Performance: Better query performance for silver/gold transformations
- Maintenance: Automated optimization reduces long-term costs

The additional UC overhead is minimal compared to the operational benefits, especially in production environments where data quality and governance are crucial.

Re: Best Practices for Designing Bronze Layer with SQL Server Source in Medallion Architecture

Mohan_Baabu1 — Mon, 04 Aug 2025 05:49:02 GMT

Thanks @lingareddy_Alva for your response, considering the cost effective and the medallion architecture, i need to create bronze tables in the unity catalog than saving the files directly to the ADLS container for the bronze layer.

Also considering the features of delta tables,

- Schema Evolution - Handle SQL Server schema changes gracefully
- ACID Transactions - Consistent data during ingestion
- Time Travel - Debug data issues and recover from bad loads
- Performance - Z-ordering and file compaction
- Governance - Unity Catalog lineage and access control

Re: Best Practices for Designing Bronze Layer with SQL Server Source in Medallion Architecture

pgo — Mon, 04 Aug 2025 06:34:08 GMT

Create the bronze table using Auto Loader and store it in Delta format. Although it might seem like you'll only read from bronze once to populate the silver layer, in real-world production scenarios, you'll often need to re-read from bronze—for reprocessing, debugging, or new use cases. Using Delta ensures those repeated reads are performant and reliable, compared to reading directly from raw formats like JSON or CSV.

Re: Best Practices for Designing Bronze Layer with SQL Server Source in Medallion Architecture

Mohan_Baabu1 — Mon, 04 Aug 2025 06:43:39 GMT

thanks @pgo