Hi
@JothyGanesan,
This is a common scenario when running Lakeflow Spark Declarative Pipelines (SDP), previously known as DLT, in continuous mode across multi-country rollouts. There are several strategies to handle metadata changes on your streaming tables without resorting to a full refresh each time.
UNDERSTANDING WHAT REQUIRES A FULL REFRESH
Not all metadata changes require a full refresh. Here is a breakdown:
Changes that typically DO NOT require full refresh:
- Adding new columns to the end of a streaming table definition
- Changing table-level TBLPROPERTIES (comments, tags, etc.)
- Adding or modifying column comments
- Setting column-level tags or masks via ALTER STREAMING TABLE
Changes that typically DO require full refresh:
- Renaming existing columns
- Dropping columns
- Changing column data types
- Fundamentally restructuring the query logic
STRATEGY 1: USE DELTA COLUMN MAPPING
Enable Delta column mapping on your tables to allow metadata-only column renames and drops without rewriting data files. Set this table property in your pipeline definition:
TBLPROPERTIES (
'delta.columnMapping.mode' = 'name'
)
With column mapping enabled, operations like renaming or dropping columns become metadata-only changes, which means the underlying data files are not rewritten. Note that for streaming reads, you may need to configure a schema tracking location. When used inside SDP, this is managed automatically.
STRATEGY 2: USE APPEND FLOWS TO ADD NEW SOURCES WITHOUT FULL REFRESH
If the metadata change involves adding new data sources (for example, new countries going live), use append flows rather than modifying the main query. This lets you add new streaming sources to an existing streaming table without triggering a full refresh.
Python example:
from pyspark import pipelines as dp
dp.create_streaming_table("customers_silver")
@DP.append_flow(target="customers_silver")
def country_a_flow():
return spark.readStream.table("country_a_bronze")
@DP.append_flow(target="customers_silver")
def country_b_flow():
return spark.readStream.table("country_b_bronze")
SQL example:
CREATE OR REFRESH STREAMING TABLE customers_silver;
CREATE FLOW country_a_flow
AS INSERT INTO customers_silver BY NAME
SELECT * FROM STREAM(country_a_bronze);
CREATE FLOW country_b_flow
AS INSERT INTO customers_silver BY NAME
SELECT * FROM STREAM(country_b_bronze);
Each new country can be added as a new append flow, and existing data remains untouched.
STRATEGY 3: PROTECT CRITICAL TABLES WITH pipelines.reset.allowed
To prevent accidental full refreshes on tables with expensive historical data, set:
TBLPROPERTIES (
pipelines.reset.allowed = false
)
This blocks any full refresh on the table. If someone attempts a full refresh, the pipeline will skip that table. This is a safety net while you work on the metadata change approach.
STRATEGY 4: SEPARATE INGESTION FROM TRANSFORMATION
Consider structuring your pipeline in two layers:
1. A raw/bronze streaming table that ingests data as-is (minimal transformation, stable schema)
2. Downstream materialized views or streaming tables that apply the business metadata and transformations
When business metadata changes, you only need to update the downstream layer. Materialized views recompute incrementally by default and do not require a full refresh for query changes. The upstream streaming table with the raw data stays untouched.
STRATEGY 5: USE ALTER STREAMING TABLE FOR SUPPORTED CHANGES
For certain metadata changes, you can use ALTER STREAMING TABLE without modifying the pipeline definition at all:
ALTER STREAMING TABLE my_table
ALTER COLUMN my_column COMMENT 'Updated description';
ALTER STREAMING TABLE my_table
SET TAGS ('region' = 'EMEA', 'version' = '2.0');
These changes are applied immediately without requiring any refresh.
RECOMMENDED APPROACH FOR YOUR SCENARIO
Given that you have a multi-country continuous pipeline with 2+ years of historical data:
1. Set pipelines.reset.allowed = false on all critical streaming tables as an immediate safety measure
2. Enable delta.columnMapping.mode = 'name' on your tables to unlock metadata-only column operations
3. Use append flows for new country onboarding, so each country is a separate flow feeding the same target table
4. Move business metadata logic into downstream materialized views where changes do not require a full refresh of the streaming data
5. For any column additions, add them to the end of your schema definition, which is handled incrementally
DOCUMENTATION REFERENCES
- Pipeline update types and full refresh:
https://docs.databricks.com/aws/en/delta-live-tables/updates- Table properties including pipelines.reset.allowed:
https://docs.databricks.com/aws/en/delta-live-tables/properties- Append flows:
https://docs.databricks.com/aws/en/delta-live-tables/flows- Delta column mapping:
https://docs.databricks.com/aws/en/delta/column-mapping.html- Pipeline modes (continuous vs triggered):
https://docs.databricks.com/aws/en/delta-live-tables/pipeline-modeNote: The product previously known as "DLT" or "Delta Live Tables" is now called Lakeflow Spark Declarative Pipelines (SDP).
* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.
If this answer resolves your question, could you mark it as "Accept as Solution"? That helps other users quickly find the correct fix.