Hello @TinaDouglass
Good day!!
First of all, welcome to databricks which is a unified platform.
I took some time to answer your query in detail.
Use Databricks' medallion architecture (also called bronze-silver-gold layers) to structure your data pipeline. This is a best-practice pattern for ingesting raw data, refining it, and creating consumable aggregates. It ensures consistency, traceability, and flexibility in databrick's unity catalog.
Step 1: Data Ingestion (Bronze Layer - Raw Transactional Data):
Ingest the transactional tables from your legacy system into Databricks as-is (raw format). Use Databricks' ingestion tools like Auto Loader for streaming/batch loads, or DBConnect/JDBC for direct pulls if the legacy system supports it. You can store data in delta format, which is more optimised and follows acid prinicipals which means that, you can keep the records of the raw data as a source of truth.
Step 2: Data Cleansing and Transformation (Silver Layer - Refined Data)
Process the bronze data to create cleaned, validated transactional records. Apply business rules, deduplication, and enrichment here.
Step 3: Aggregation and Summarization (Gold Layer - Business-Ready Metrics)
Summarize the silver data into metrics matching your legacy table's structure, plus any enhancements for the new dashboard. From gold layer, you can share this structured data with DA,ML or DS to get best out of it
In databricks, we have incremental processing which is smarter in reading and updating newly available data, you can use serverless to optimise more on billing(cost-effective, simplifies operations, and supports both batch and streaming workloads)