<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic DLT Continuous  Pipeline load in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/dlt-continuous-pipeline-load/m-p/143846#M52229</link>
    <description>&lt;P&gt;Hi All,&lt;BR /&gt;In our project we are working on the DLT pipeline with the DLT tables as target running in continuous mode.&lt;BR /&gt;These tables are common for multiple countries, and we go live in batches for different countries.&lt;/P&gt;&lt;P&gt;So, every time a new change is requested by the business which demands a change in the metadata of the DLT table, we are updating the DLT table creation in the notebook and we are forced to run the pipeline in full refresh.&lt;/P&gt;&lt;P&gt;This is becoming a big concern as:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Unless a table has data in its corresponding source system, we will not be able to reload the complete data as we do full-refresh and we might lose the history data which was already there in the current table prior to full-refresh&lt;/LI&gt;&lt;LI&gt;The SLA gets impacted as the refresh takes huge time to rerun the complete 2 years of data sometimes even close to 7 hours&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Is there a way that we can work with these DLT pipelines but at the same time, work from that point of time run even if it demands metadata updates? Please help&lt;/P&gt;</description>
    <pubDate>Tue, 13 Jan 2026 09:42:12 GMT</pubDate>
    <dc:creator>JothyGanesan</dc:creator>
    <dc:date>2026-01-13T09:42:12Z</dc:date>
    <item>
      <title>DLT Continuous  Pipeline load</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-continuous-pipeline-load/m-p/143846#M52229</link>
      <description>&lt;P&gt;Hi All,&lt;BR /&gt;In our project we are working on the DLT pipeline with the DLT tables as target running in continuous mode.&lt;BR /&gt;These tables are common for multiple countries, and we go live in batches for different countries.&lt;/P&gt;&lt;P&gt;So, every time a new change is requested by the business which demands a change in the metadata of the DLT table, we are updating the DLT table creation in the notebook and we are forced to run the pipeline in full refresh.&lt;/P&gt;&lt;P&gt;This is becoming a big concern as:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Unless a table has data in its corresponding source system, we will not be able to reload the complete data as we do full-refresh and we might lose the history data which was already there in the current table prior to full-refresh&lt;/LI&gt;&lt;LI&gt;The SLA gets impacted as the refresh takes huge time to rerun the complete 2 years of data sometimes even close to 7 hours&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Is there a way that we can work with these DLT pipelines but at the same time, work from that point of time run even if it demands metadata updates? Please help&lt;/P&gt;</description>
      <pubDate>Tue, 13 Jan 2026 09:42:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-continuous-pipeline-load/m-p/143846#M52229</guid>
      <dc:creator>JothyGanesan</dc:creator>
      <dc:date>2026-01-13T09:42:12Z</dc:date>
    </item>
    <item>
      <title>Re: DLT Continuous  Pipeline load</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-continuous-pipeline-load/m-p/143927#M52239</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/134682"&gt;@JothyGanesan&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Use dynamic schema handling and selective table updates to apply metadata changes incrementally from the current watermark, preserving history across country go-lives.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Replace static @dlt.table definitions with Auto Loader's schema inference/evolution—handles column adds/drops/types without code changes&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="ManojkMohan_0-1768326383186.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/22951iF65D78021326C6F6/image-size/medium?v=v2&amp;amp;px=400" role="button" title="ManojkMohan_0-1768326383186.png" alt="ManojkMohan_0-1768326383186.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Metadata Change Workflow (No Full Refresh)&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Update silver/gold logic (add/drop columns, new CTEs).&lt;/LI&gt;&lt;LI&gt;Start pipeline update → DLT only refreshes affected tables downstream.&lt;/LI&gt;&lt;LI&gt;Bronze Auto Loader resumes incrementally using existing schema location + checkpoints&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="ManojkMohan_1-1768326512579.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/22952i3497B9601B5546A6/image-size/medium?v=v2&amp;amp;px=400" role="button" title="ManojkMohan_1-1768326512579.png" alt="ManojkMohan_1-1768326512579.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;New country? Update isin() filter → incremental only.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Validate before Go Live:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="ManojkMohan_2-1768326565633.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/22953i3D641C726DAEA5BE/image-size/medium?v=v2&amp;amp;px=400" role="button" title="ManojkMohan_2-1768326565633.png" alt="ManojkMohan_2-1768326565633.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;This eliminates 7-hour refreshes—new metadata propagates in ~10-30 minutes depending on daily delta volume. Your continuous mode stays live during changes. Test on one country first?&lt;/P&gt;</description>
      <pubDate>Tue, 13 Jan 2026 17:50:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-continuous-pipeline-load/m-p/143927#M52239</guid>
      <dc:creator>ManojkMohan</dc:creator>
      <dc:date>2026-01-13T17:50:03Z</dc:date>
    </item>
    <item>
      <title>Hi @JothyGanesan,  This is a common scenario when running...</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-continuous-pipeline-load/m-p/150241#M53313</link>
      <description>Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/134682"&gt;@JothyGanesan&lt;/a&gt;,&lt;BR /&gt;&lt;BR /&gt;This is a common scenario when running Lakeflow Spark Declarative Pipelines (SDP), previously known as DLT, in continuous mode across multi-country rollouts. There are several strategies to handle metadata changes on your streaming tables without resorting to a full refresh each time.&lt;BR /&gt;&lt;BR /&gt;UNDERSTANDING WHAT REQUIRES A FULL REFRESH&lt;BR /&gt;&lt;BR /&gt;Not all metadata changes require a full refresh. Here is a breakdown:&lt;BR /&gt;&lt;BR /&gt;Changes that typically DO NOT require full refresh:&lt;BR /&gt;- Adding new columns to the end of a streaming table definition&lt;BR /&gt;- Changing table-level TBLPROPERTIES (comments, tags, etc.)&lt;BR /&gt;- Adding or modifying column comments&lt;BR /&gt;- Setting column-level tags or masks via ALTER STREAMING TABLE&lt;BR /&gt;&lt;BR /&gt;Changes that typically DO require full refresh:&lt;BR /&gt;- Renaming existing columns&lt;BR /&gt;- Dropping columns&lt;BR /&gt;- Changing column data types&lt;BR /&gt;- Fundamentally restructuring the query logic&lt;BR /&gt;&lt;BR /&gt;STRATEGY 1: USE DELTA COLUMN MAPPING&lt;BR /&gt;&lt;BR /&gt;Enable Delta column mapping on your tables to allow metadata-only column renames and drops without rewriting data files. Set this table property in your pipeline definition:&lt;BR /&gt;&lt;BR /&gt;    TBLPROPERTIES (&lt;BR /&gt;      'delta.columnMapping.mode' = 'name'&lt;BR /&gt;    )&lt;BR /&gt;&lt;BR /&gt;With column mapping enabled, operations like renaming or dropping columns become metadata-only changes, which means the underlying data files are not rewritten. Note that for streaming reads, you may need to configure a schema tracking location. When used inside SDP, this is managed automatically.&lt;BR /&gt;&lt;BR /&gt;STRATEGY 2: USE APPEND FLOWS TO ADD NEW SOURCES WITHOUT FULL REFRESH&lt;BR /&gt;&lt;BR /&gt;If the metadata change involves adding new data sources (for example, new countries going live), use append flows rather than modifying the main query. This lets you add new streaming sources to an existing streaming table without triggering a full refresh.&lt;BR /&gt;&lt;BR /&gt;Python example:&lt;BR /&gt;&lt;BR /&gt;    from pyspark import pipelines as dp&lt;BR /&gt;&lt;BR /&gt;    dp.create_streaming_table("customers_silver")&lt;BR /&gt;&lt;BR /&gt;    &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/25059"&gt;@DP&lt;/a&gt;.append_flow(target="customers_silver")&lt;BR /&gt;    def country_a_flow():&lt;BR /&gt;        return spark.readStream.table("country_a_bronze")&lt;BR /&gt;&lt;BR /&gt;    &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/25059"&gt;@DP&lt;/a&gt;.append_flow(target="customers_silver")&lt;BR /&gt;    def country_b_flow():&lt;BR /&gt;        return spark.readStream.table("country_b_bronze")&lt;BR /&gt;&lt;BR /&gt;SQL example:&lt;BR /&gt;&lt;BR /&gt;    CREATE OR REFRESH STREAMING TABLE customers_silver;&lt;BR /&gt;&lt;BR /&gt;    CREATE FLOW country_a_flow&lt;BR /&gt;    AS INSERT INTO customers_silver BY NAME&lt;BR /&gt;    SELECT * FROM STREAM(country_a_bronze);&lt;BR /&gt;&lt;BR /&gt;    CREATE FLOW country_b_flow&lt;BR /&gt;    AS INSERT INTO customers_silver BY NAME&lt;BR /&gt;    SELECT * FROM STREAM(country_b_bronze);&lt;BR /&gt;&lt;BR /&gt;Each new country can be added as a new append flow, and existing data remains untouched.&lt;BR /&gt;&lt;BR /&gt;STRATEGY 3: PROTECT CRITICAL TABLES WITH pipelines.reset.allowed&lt;BR /&gt;&lt;BR /&gt;To prevent accidental full refreshes on tables with expensive historical data, set:&lt;BR /&gt;&lt;BR /&gt;    TBLPROPERTIES (&lt;BR /&gt;      pipelines.reset.allowed = false&lt;BR /&gt;    )&lt;BR /&gt;&lt;BR /&gt;This blocks any full refresh on the table. If someone attempts a full refresh, the pipeline will skip that table. This is a safety net while you work on the metadata change approach.&lt;BR /&gt;&lt;BR /&gt;STRATEGY 4: SEPARATE INGESTION FROM TRANSFORMATION&lt;BR /&gt;&lt;BR /&gt;Consider structuring your pipeline in two layers:&lt;BR /&gt;&lt;BR /&gt;1. A raw/bronze streaming table that ingests data as-is (minimal transformation, stable schema)&lt;BR /&gt;2. Downstream materialized views or streaming tables that apply the business metadata and transformations&lt;BR /&gt;&lt;BR /&gt;When business metadata changes, you only need to update the downstream layer. Materialized views recompute incrementally by default and do not require a full refresh for query changes. The upstream streaming table with the raw data stays untouched.&lt;BR /&gt;&lt;BR /&gt;STRATEGY 5: USE ALTER STREAMING TABLE FOR SUPPORTED CHANGES&lt;BR /&gt;&lt;BR /&gt;For certain metadata changes, you can use ALTER STREAMING TABLE without modifying the pipeline definition at all:&lt;BR /&gt;&lt;BR /&gt;    ALTER STREAMING TABLE my_table&lt;BR /&gt;      ALTER COLUMN my_column COMMENT 'Updated description';&lt;BR /&gt;&lt;BR /&gt;    ALTER STREAMING TABLE my_table&lt;BR /&gt;      SET TAGS ('region' = 'EMEA', 'version' = '2.0');&lt;BR /&gt;&lt;BR /&gt;These changes are applied immediately without requiring any refresh.&lt;BR /&gt;&lt;BR /&gt;RECOMMENDED APPROACH FOR YOUR SCENARIO&lt;BR /&gt;&lt;BR /&gt;Given that you have a multi-country continuous pipeline with 2+ years of historical data:&lt;BR /&gt;&lt;BR /&gt;1. Set pipelines.reset.allowed = false on all critical streaming tables as an immediate safety measure&lt;BR /&gt;2. Enable delta.columnMapping.mode = 'name' on your tables to unlock metadata-only column operations&lt;BR /&gt;3. Use append flows for new country onboarding, so each country is a separate flow feeding the same target table&lt;BR /&gt;4. Move business metadata logic into downstream materialized views where changes do not require a full refresh of the streaming data&lt;BR /&gt;5. For any column additions, add them to the end of your schema definition, which is handled incrementally&lt;BR /&gt;&lt;BR /&gt;DOCUMENTATION REFERENCES&lt;BR /&gt;&lt;BR /&gt;- Pipeline update types and full refresh: &lt;A href="https://docs.databricks.com/aws/en/delta-live-tables/updates" target="_blank"&gt;https://docs.databricks.com/aws/en/delta-live-tables/updates&lt;/A&gt;&lt;BR /&gt;- Table properties including pipelines.reset.allowed: &lt;A href="https://docs.databricks.com/aws/en/delta-live-tables/properties" target="_blank"&gt;https://docs.databricks.com/aws/en/delta-live-tables/properties&lt;/A&gt;&lt;BR /&gt;- Append flows: &lt;A href="https://docs.databricks.com/aws/en/delta-live-tables/flows" target="_blank"&gt;https://docs.databricks.com/aws/en/delta-live-tables/flows&lt;/A&gt;&lt;BR /&gt;- Delta column mapping: &lt;A href="https://docs.databricks.com/aws/en/delta/column-mapping.html" target="_blank"&gt;https://docs.databricks.com/aws/en/delta/column-mapping.html&lt;/A&gt;&lt;BR /&gt;- Pipeline modes (continuous vs triggered): &lt;A href="https://docs.databricks.com/aws/en/delta-live-tables/pipeline-mode" target="_blank"&gt;https://docs.databricks.com/aws/en/delta-live-tables/pipeline-mode&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;Note: The product previously known as "DLT" or "Delta Live Tables" is now called Lakeflow Spark Declarative Pipelines (SDP).&lt;BR /&gt;&lt;BR /&gt;* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.&lt;BR /&gt;&lt;BR /&gt;If this answer resolves your question, could you mark it as "Accept as Solution"? That helps other users quickly find the correct fix.</description>
      <pubDate>Sun, 08 Mar 2026 18:52:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-continuous-pipeline-load/m-p/150241#M53313</guid>
      <dc:creator>SteveOstrowski</dc:creator>
      <dc:date>2026-03-08T18:52:18Z</dc:date>
    </item>
  </channel>
</rss>

