cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to Incorporate Historical Data in Delta Live Pipeline?

Blake
New Contributor III

Now that delta live pipeline is GA we are looking to convert our existing processes to leverage it. One thing that remains unclear is how to populate new delta live tables with historical data?

Currently we are looking to use CDC by leveraging create_target_table to apply_changes into a bronze and a silver layer to keep history going forward. When trying to merge into the create_target_table outside of the DLT pipeline I get an error saying it must be a delta table and not a view.

I have also attempted drop view and recreate as a managed delta table. I am able to populate this table with the historical data but cannot use it in the DLT pipeline.

The other option I am considering is having the DLT pipeline execute a different set of code that pulls from the delta tables once and then convert to the daily code afterwards.

We have ~150m rows in a delta table that we would like to incorporate into the DLT pipeline. How can we populate the DLT silver and bronze layer with historical data from a managed delta tables? I would like to avoid running the entire ETL process for all rows. Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions

Blake
New Contributor III

@Kaniz Fatma​ Hello, sorry for the delayed response. The guide does not answer how to incorporate existing delta tables that container historical data into a delta live pipeline. We ended up changing the source data to pull from the existing bronze table as a work around

View solution in original post

4 REPLIES 4

Kaniz
Community Manager
Community Manager

Hi @Blake Brown​, This guide will demonstrate how you can leverage Change Data Capture in Delta Live Tables pipelines to identify new records and capture changes made to the data set in your data lake.

Delta Live Tables pipelines enable you to develop scalable, reliable, and low latency data pipelines while performing Change Data Capture in your data lake with the minimum required computation resources and seamless out-of-order data handling.

Note: We recommend following the Getting Started with Delta Live Tables which explains creating scalable and reliable pipelines using Delta Live Tables (DLT) and its declarative ETL definitions.

Kaniz
Community Manager
Community Manager

Hi @Blake Brown​, We haven’t heard from you on the last response from me, and I was checking back to see if you have a resolution yet. If you have any solution, please do share that with the community as it can be helpful to others. Otherwise, we will respond with more details and try to help.

Blake
New Contributor III

@Kaniz Fatma​ Hello, sorry for the delayed response. The guide does not answer how to incorporate existing delta tables that container historical data into a delta live pipeline. We ended up changing the source data to pull from the existing bronze table as a work around

Thank you for your reply. Marking your response as best.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.