<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Scaling Declarative Streaming Pipelines for CDC from On-Prem Database to Lakehouse in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/scaling-declarative-streaming-pipelines-for-cdc-from-on-prem/m-p/137815#M50820</link>
    <description>&lt;P&gt;Yes, a databricks labs project seems perfect for your scenario.&lt;/P&gt;
&lt;P&gt;&lt;A href="https://databrickslabs.github.io/dlt-meta/index.html" target="_blank"&gt;https://databrickslabs.github.io/dlt-meta/index.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 05 Nov 2025 17:49:13 GMT</pubDate>
    <dc:creator>AbhaySingh</dc:creator>
    <dc:date>2025-11-05T17:49:13Z</dc:date>
    <item>
      <title>Scaling Declarative Streaming Pipelines for CDC from On-Prem Database to Lakehouse</title>
      <link>https://community.databricks.com/t5/data-engineering/scaling-declarative-streaming-pipelines-for-cdc-from-on-prem/m-p/137637#M50787</link>
      <description>&lt;P&gt;We have a scenario where we need to mirror thousands of tables from on-premises Db2 databases to an Azure Lakehouse. The goal is to create mirror Delta tables in the Lakehouse.&lt;/P&gt;&lt;P&gt;Since LakeFlow Connect&amp;nbsp;currently does not support direct mirroring from on-prem Db2, we are using Qlik Replicate to capture CDC data and land it in ADLS Gen2 in Parquet format — one table per folder.&lt;/P&gt;&lt;P&gt;We then created a Declarative Pipeline in Databricks using Auto Loader to read the CDC files in streaming mode into a staging bronze streaming table. From there, we use Auto-CDC to apply SCD Type 1 logic and write to the final bronze streaming table, running the pipeline in continuous mode.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;The challenge is:&lt;/STRONG&gt;&lt;BR /&gt;To stream thousands of tables, we would need to create thousands of individual declarative streaming pipelines, which is not scalable.&lt;/P&gt;&lt;P&gt;We considered using a configuration table to loop through the list of source tables dynamically, but this would require scheduling the pipeline — and we want to keep it in continuous mode, not scheduled.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Question&lt;/STRONG&gt;:&lt;BR /&gt;Is there a scalable solution or pattern in Databricks to dynamically stream CDC data for thousands of tables using a single or minimal number of declarative pipelines, while keeping the pipeline in continuous mode?&lt;/P&gt;&lt;P&gt;Any guidance or best practices would be appreciated!&lt;/P&gt;</description>
      <pubDate>Tue, 04 Nov 2025 19:34:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/scaling-declarative-streaming-pipelines-for-cdc-from-on-prem/m-p/137637#M50787</guid>
      <dc:creator>vartyg</dc:creator>
      <dc:date>2025-11-04T19:34:42Z</dc:date>
    </item>
    <item>
      <title>Re: Scaling Declarative Streaming Pipelines for CDC from On-Prem Database to Lakehouse</title>
      <link>https://community.databricks.com/t5/data-engineering/scaling-declarative-streaming-pipelines-for-cdc-from-on-prem/m-p/137639#M50788</link>
      <description>&lt;P&gt;Just use&amp;nbsp;&lt;A href="https://flink.apache.org" target="_blank"&gt;https://flink.apache.org&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 04 Nov 2025 19:49:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/scaling-declarative-streaming-pipelines-for-cdc-from-on-prem/m-p/137639#M50788</guid>
      <dc:creator>bidek56</dc:creator>
      <dc:date>2025-11-04T19:49:22Z</dc:date>
    </item>
    <item>
      <title>Re: Scaling Declarative Streaming Pipelines for CDC from On-Prem Database to Lakehouse</title>
      <link>https://community.databricks.com/t5/data-engineering/scaling-declarative-streaming-pipelines-for-cdc-from-on-prem/m-p/137815#M50820</link>
      <description>&lt;P&gt;Yes, a databricks labs project seems perfect for your scenario.&lt;/P&gt;
&lt;P&gt;&lt;A href="https://databrickslabs.github.io/dlt-meta/index.html" target="_blank"&gt;https://databrickslabs.github.io/dlt-meta/index.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 05 Nov 2025 17:49:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/scaling-declarative-streaming-pipelines-for-cdc-from-on-prem/m-p/137815#M50820</guid>
      <dc:creator>AbhaySingh</dc:creator>
      <dc:date>2025-11-05T17:49:13Z</dc:date>
    </item>
  </channel>
</rss>

