Scaling Declarative Streaming Pipelines for CDC from On-Prem Database to Lakehouse

vartyg — Tue, 04 Nov 2025 19:34:42 GMT

We have a scenario where we need to mirror thousands of tables from on-premises Db2 databases to an Azure Lakehouse. The goal is to create mirror Delta tables in the Lakehouse.

Since LakeFlow Connect currently does not support direct mirroring from on-prem Db2, we are using Qlik Replicate to capture CDC data and land it in ADLS Gen2 in Parquet format — one table per folder.

We then created a Declarative Pipeline in Databricks using Auto Loader to read the CDC files in streaming mode into a staging bronze streaming table. From there, we use Auto-CDC to apply SCD Type 1 logic and write to the final bronze streaming table, running the pipeline in continuous mode.

The challenge is:
To stream thousands of tables, we would need to create thousands of individual declarative streaming pipelines, which is not scalable.

We considered using a configuration table to loop through the list of source tables dynamically, but this would require scheduling the pipeline — and we want to keep it in continuous mode, not scheduled.

Question:
Is there a scalable solution or pattern in Databricks to dynamically stream CDC data for thousands of tables using a single or minimal number of declarative pipelines, while keeping the pipeline in continuous mode?

Any guidance or best practices would be appreciated!

Re: Scaling Declarative Streaming Pipelines for CDC from On-Prem Database to Lakehouse

bidek56 — Tue, 04 Nov 2025 19:49:22 GMT

Just use https://flink.apache.org

Re: Scaling Declarative Streaming Pipelines for CDC from On-Prem Database to Lakehouse

AbhaySingh — Wed, 05 Nov 2025 17:49:13 GMT

Yes, a databricks labs project seems perfect for your scenario.

https://databrickslabs.github.io/dlt-meta/index.html

topic Re: Scaling Declarative Streaming Pipelines for CDC from On-Prem Database to Lakehouse in Data Engineering

Scaling Declarative Streaming Pipelines for CDC from On-Prem Database to Lakehouse

Re: Scaling Declarative Streaming Pipelines for CDC from On-Prem Database to Lakehouse

Re: Scaling Declarative Streaming Pipelines for CDC from On-Prem Database to Lakehouse