Historical Migration of Data from Delta Tables to Delta Live Tables
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-01-2024 09:02 AM
Our team is planning to migrate to the Delta Live Tables (DLT) framework for data ingestion. We currently have Delta tables populated with several years of data from ingested files and wish to avoid re-ingesting these files. What is the best approach for transitioning this historical data into Delta Live Tables? Additionally, since DLT enforces that a single pipeline manages writes to a Delta Live Table and maintains an event log for proper commits, how can we efficiently handle historical loads, which involve straightforward inserts, while supporting daily loads that require different ingestion logic.
- Labels:
-
Delta Lake
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-03-2024 11:01 PM
For backfills, you can add a flow definition to the existing pipeline. Otherwise, you can do a bulk insert into the table from outside of the DLT pipeline with INSERT INTO and its equivalents
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-04-2024 12:12 AM
Hey @nalindhar
I assume you want the target table to have the same name once you declare your ETL operations in the DLT pipeline, if that's the case, begin with renaming your delta table to specify that they contain historical data, you can do this via ALTER TABLE syntax. Later, you'd want to use the same name of the table in CREATE LIVE TABLE syntax and all the data ingestion logic to this table definition.
Additionally, as cgrant mentioned, use the append_flow definition to insert the historical data into your materialized view.

