Databricks Community

HitMah · ‎10-04-2025

As EDW will continue to evolve with new data and business logic during the multi-phased migration, what architectural strategies and design patterns can minimize rework when migrating from an evolving Enterprise Data Warehouse (EDW) to Databricks?

I read ETL-First approach but it's not much documented and not many videos explaining in details by Databricks.

szymon_dybczak · ‎10-04-2025

Hi @HitMah ,

Databricks recommendations is to use medallion architecture. A medallion architecture is a data design pattern used to logically organize data in a lakehouse, with the goal of incrementally and progressively improving the structure and quality of data as it flows through each layer of the architecture (from Bronze ⇒ Silver ⇒ Gold layer tables).

So we have 3 layers:

The Bronze layer is where we land all the data from external source systems. We should not apply any transformation to source data at this layer except adding some metadata like load time, process id etc. The main purpose of this layer is to have an historical archive of source and also to have an ability of reprocessing if needed without rereading the data from the source system.
In the Silver layer of the lakehouse, data from the Bronze layer is cleansed,deduplicated and merged. Silver layer provide an "Enterprise view" (e.g. master customers, stores, non-duplicated transactions and cross-reference tables).
The Gold layer of the lakehouse is designed for business users and contains core business logic. We define the final layer of data transformations and data quality rules are applied here. So, we see a lot of Kimball style star schema-based data models or Inmon style Data marts fit in this Gold Layer of the lakehouse.

In the lakehouse data engineering paradigm, typically the ELT methodology is followed (instead of classical ETL). You can read about the difference between those 2 approaches at below articles:

Extract Transform Load (ETL) | Databricks

ETL vs ELT - Difference Between Data-Processing Approaches - AWS

And when you're migrating existing EDW to Lakehouse you can use various tools and accelerators to make this process easier. You can leverge:

- Lakeflow Connect which supports the ingestion of data coming from legacy database systems but also by ingesting new data coming from modern Saas platforms like Salesforce or Workday.

- Lakehouse Federation to easily and quickly replicate data marts coming from the legacy data warehouse to Databricks

- BladeBridge to support automatic code conversion coming from the legacy (e.g., Teradata BTEQ, Oracle stored procedures, Third party ETL pipelines).

- Databricks Workflows to support complex pipelines

- Lakeflow Declartive Pipelines which simplifies batch and streaming ETL with automated reliability and built-in data quality.

If you have any other question feel free to ask 🙂

HitMah · ‎10-05-2025

Thank you for your response.

I’m familiar with the Medallion Architecture concept in Databricks. My question is more focused on how to effectively run our existing Enterprise Data Warehouse (EDW) and Databricks in parallel during the migration phase.

For example, we currently need to develop a new data mart in the EDW, but our Databricks platform is still being built — it may take another 6+ months before it’s production-ready, and potentially another year before it can fully meet the data mart requirements.

Given this overlap, what would be the best architectural approach or recommended practices to minimize rework when we eventually migrate this new data mart to Databricks?

Thanks

-werners- · ‎10-06-2025

There is no single approach. It depends on your organization.
First you have the 'impact-axis' which is lean and mean vs big bang.
Next you also have bottom-up (first fix bronze and work upwards) or top-down (focus on gold and read data from your legacy EDW using connections).
So a lot is possible.
I am always a fan on focusing on a single stream/cube/datamart and start migrating that.
That is more a leand and mean/bottom up approach, which i prefer. But you might think otherwise.

Just don't think that your Databricks Lakehouse will not change. With planning you can limit the amount of reengineering but not avoid it. Basically it is the same as a EDW but way less strict, which has its pros and cons.

Databricks Community

Transitioning Approach for Evolving EDW

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! November 28 – December 04, 2025

Jaipur Usergroup First Virtual Meetup: AI/BI Genie + Data Science Careers — 19 Dec | 6 PM IST

Lakehouse, Lagers & Legends — Bangalore Meetup | December 13

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐