Is it possible to migrate data from one DLT pipeline to another?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-07-2024 09:27 AM
Hi,
We have a DLT pipeline that has been running for a while with a Hive Metastore target that has stored billions of records. We'd like to move the data to a Unity Catalog. The documentation says "Existing pipelines that use the Hive metastore cannot be upgraded to use Unity Catalog. To migrate an existing pipeline that writes to Hive metastore, you must create a new pipeline and re-ingest data from the data source(s)." The problem is that the original data sources no longer exist, so we can't just start a new pipeline and get all the data. Is there any way to migrate/copy the data from the existing pipeline to a new one as the starting point for that pipeline, so it doesn't have to start from the beginning?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-07-2024 11:53 PM
@MarkD good day!
I'm sorry, but according to the description, existing pipelines using the Hive metastore cannot be upgraded to use Unity Catalog. To migrate an existing pipeline that writes to Hive metastore, you must create a new pipeline and re-ingest data from the data source(s). If the original data sources are no longer available, there is no documented method to migrate or copy the data from the existing pipeline to a new one.
The documentation suggests that the data must be re-ingested from the original data sources when creating a new pipeline, and there is no mention of a method to use data from an existing pipeline as the starting point for a new pipeline.
Doc: https://docs.databricks.com/en/delta-live-tables/unity-catalog.html#limitations
Kind regards,
Yesh