cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Summarized Data from Source system into Bronze

TinaDouglass
New Contributor

Hello,

We are just starting with Databricks. Quick question.  We have a table in our legacy source system that summarizes values that are used on legacy reports and used for payment in our legacy system.  The business wants a dashboard on our new platform that contains some of these metrics.  I am thinking that we should be investigating the transactional tables that are used in our legacy system to bring into Databricks and have that summarize for analytics and reporting occur in Databricsk.  If we brought over the summarized data, we would have the risk that measures from the summarization in legacy is different than when/if there is another reason that the transactional data would be brought into Databricks and summarized.  Also, our legacy will be retired in 1-3 years.  Suggestions?  Thoughts?

2 REPLIES 2

Khaja_Zaffer
Contributor

Hello @TinaDouglass 

Good day!!

First of all, welcome to databricks which is a unified platform. 

I took some time to answer your query in detail. 

 

Use Databricks' medallion architecture (also called bronze-silver-gold layers) to structure your data pipeline. 
This is a best-practice pattern for ingesting raw data, refining it, and creating consumable aggregates. It ensures consistency, traceability, and flexibility in databrick's unity catalog. 
 
Step 1: Data Ingestion (Bronze Layer - Raw Transactional Data):
Ingest the transactional tables from your legacy system into Databricks as-is (raw format). Use Databricks' ingestion tools like Auto Loader for streaming/batch loads, or DBConnect/JDBC for direct pulls if the legacy system supports it. You can store data in delta format, which is more optimised and follows acid prinicipals which means that, you can keep the records of the raw data as a source of truth. 
 
Step 2: Data Cleansing and Transformation (Silver Layer - Refined Data)
Process the bronze data to create cleaned, validated transactional records. Apply business rules, deduplication, and enrichment here. 
 
Step 3: Aggregation and Summarization (Gold Layer - Business-Ready Metrics)
Summarize the silver data into metrics matching your legacy table's structure, plus any enhancements for the new dashboard. From gold layer, you can share this structured data with DA,ML or DS to get best out of it 
 
In databricks, we have incremental processing which is smarter in reading and updating newly available data, you can use serverless to optimise more on billing(cost-effective, simplifies operations, and supports both batch and streaming workloads)
 
reference : https://www.databricks.com/glossary/medallion-architecture 

I hope this helps you. 

Have a great day!

LokeshReddy
New Contributor II

Hi,Yes, you’re thinking in the right direction. Bring the underlying transactional tables into Raw, refine them in Silver, and then in Gold recreate the summary or build a proper star/snowflake model. This way you avoid inconsistencies and are ready when the legacy system is retired.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now