History load from Source and
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-27-2025 04:43 PM
Hi
As part of our requirement we wanted to load a huge historical data from the Source System to Databricks in Bronze and then process it to Gold, We wanted to use batch with read and Write so that the historical load is done and then for the delta or Incremental load we wanted to use the readstream and writestream for the same table with checkpoint so that the tracking for incremental happens automatically. We wanted to use this approach as it was not possible to use streams for the historical load and later once this is done we wanted to use streams as the delta load will happen more frequent for every 15 mins. Any approaches on how this can be implemented.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-27-2025 09:24 PM
What is the size of your historical load and are you loading your historical data from a delta table?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-28-2025 09:48 AM
around 2.5 billion records around 1TB
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-29-2025 03:01 AM
I imported 16 TB of data using ADF. In this scenario I'd create a process that will extract from a source data using ADF and then execute the rest of logic to populate tables in the gold. For the new data I'd create a separate process using Autoloader.