cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Converting Existing Streaming Job to Delta Live Tables with Historical Backfill

yazz
New Contributor II

Description:
I’m migrating a two-stage streaming job into Delta Live Tables (DLT):

  • Bronze: read from Pub/Sub → write to Bronze table

  • Silver: use create_auto_cdc_flow on Bronze → upsert into Silver table

New data works perfectly, but I now need to backfill history into the same Silver table. I’m blocked by two DLT constraints:

  1. Single-flow target: you can’t have two separate flows write to the same table

  2. No mixed modes: you can’t combine an append-only flow (preview) with create_auto_cdc_flow on one target

I tried widget-driven conditional logic to switch between backfill and CDC, but no data is written during backfill.

Request:
Has anyone backfilled historical data into a DLT-managed CDC table? What workarounds or patterns did you use to load history without conflicting with the live CDC flow? Any code snippets or best practices welcome.

2 REPLIES 2

szymon_dybczak
Esteemed Contributor III

Hi @yazz ,

I’m wondering if you could use a similar approach to the one in the article below.  So, just backfill your bronze table first. Then, the downstream silver and gold layers will pick up the new data from the bronze layer.
In that approach you don't need to look for workarounds regarding DLT constraints (single-flow target and no mixed modes)

Backfilling historical data with Lakeflow Declarative Pipelines - Azure Databricks | Microsoft Learn

yazz
New Contributor II

Thanks for the reply, Szymon. 

Yes, I have seen the example, but its just that my bronze table's schema is different from my historical backup table. The bronze table has 4 columns and has json data in the columns, while historical has 11 columns with structured data. I cannot load this data into the bronze.

Also, the example talks only about append_flow. Technically, I need a pipeline that appends only once into the silver from backup table and then another delta pipeline that keeps upserting into the silver table.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now