Databricks Community

maddan80 · ‎01-27-2025

Hi

As part of our requirement we wanted to load a huge historical data from the Source System to Databricks in Bronze and then process it to Gold, We wanted to use batch with read and Write so that the historical load is done and then for the delta or Incremental load we wanted to use the readstream and writestream for the same table with checkpoint so that the tracking for incremental happens automatically. We wanted to use this approach as it was not possible to use streams for the historical load and later once this is done we wanted to use streams as the delta load will happen more frequent for every 15 mins. Any approaches on how this can be implemented.

Lakshay · ‎01-27-2025

What is the size of your historical load and are you loading your historical data from a delta table?

maddan80 · ‎01-28-2025

around 2.5 billion records around 1TB

MariuszK · ‎01-29-2025

I imported 16 TB of data using ADF. In this scenario I'd create a process that will extract from a source data using ADF and then execute the rest of logic to populate tables in the gold. For the new data I'd create a separate process using Autoloader.

Databricks Community

History load from Source and

Join Us as a Local Community Builder!

🚀 Weekly Delta (1 - 7 October): A Look Back at This Week’s Top Community Highlights!

🌟 Community Sparks of the Week | September 26 – October 2 🌟

Solution Accelerator Series | #4 - Toxicity Detection for Gaming

Level Up with Databricks Specialist Sessions

Announcing Data Intelligence for Cybersecurity