cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

History load from Source and

maddan80
New Contributor II

Hi 

As part of our requirement we wanted to load a huge historical data from the Source System to Databricks in Bronze and then process it to Gold, We wanted to use batch with read and Write so that the historical load is done and then for the delta or Incremental load we wanted to use the readstream and writestream for the same table with checkpoint so that the tracking for incremental happens automatically. We wanted to use this approach as it was not possible to use streams for the historical load and later once this is done we wanted to use streams as the delta load will happen more frequent for every 15 mins. Any approaches on how this can be implemented.  

3 REPLIES 3

Lakshay
Databricks Employee
Databricks Employee

What is the size of your historical load and are you loading your historical data from a delta table?

maddan80
New Contributor II

around 2.5 billion records around 1TB

MariuszK
Contributor III

I imported 16 TB of data using ADF. In this scenario I'd create a process that will extract from a source data using ADF and then execute the rest of logic to populate tables in the gold. For the new data I'd create a separate process using Autoloader.