Databricks Community

TinaDouglass · ‎09-03-2025

Hello,

We are just starting with Databricks. Quick question. We have a table in our legacy source system that summarizes values that are used on legacy reports and used for payment in our legacy system. The business wants a dashboard on our new platform that contains some of these metrics. I am thinking that we should be investigating the transactional tables that are used in our legacy system to bring into Databricks and have that summarize for analytics and reporting occur in Databricsk. If we brought over the summarized data, we would have the risk that measures from the summarization in legacy is different than when/if there is another reason that the transactional data would be brought into Databricks and summarized. Also, our legacy will be retired in 1-3 years. Suggestions? Thoughts?

Khaja_Zaffer · ‎09-03-2025

Hello @TinaDouglass

Good day!!

First of all, welcome to databricks which is a unified platform.

I took some time to answer your query in detail.

Use Databricks' medallion architecture (also called bronze-silver-gold layers) to structure your data pipeline.

This is a best-practice pattern for ingesting raw data, refining it, and creating consumable aggregates. It ensures consistency, traceability, and flexibility in databrick's unity catalog.

Step 1: Data Ingestion (Bronze Layer - Raw Transactional Data):

Ingest the transactional tables from your legacy system into Databricks as-is (raw format). Use Databricks' ingestion tools like Auto Loader for streaming/batch loads, or DBConnect/JDBC for direct pulls if the legacy system supports it. You can store data in delta format, which is more optimised and follows acid prinicipals which means that, you can keep the records of the raw data as a source of truth.

Step 2: Data Cleansing and Transformation (Silver Layer - Refined Data)

Process the bronze data to create cleaned, validated transactional records. Apply business rules, deduplication, and enrichment here.

Step 3: Aggregation and Summarization (Gold Layer - Business-Ready Metrics)

Summarize the silver data into metrics matching your legacy table's structure, plus any enhancements for the new dashboard. From gold layer, you can share this structured data with DA,ML or DS to get best out of it

In databricks, we have incremental processing which is smarter in reading and updating newly available data, you can use serverless to optimise more on billing(cost-effective, simplifies operations, and supports both batch and streaming workloads)

reference : https://www.databricks.com/glossary/medallion-architecture

I hope this helps you.

Have a great day!

View solution in original post

Khaja_Zaffer · ‎09-03-2025