Identify updated rows during incremental refresh in DLT Materialized Views

ManoramTaparia — Wed, 03 Sep 2025 13:01:51 GMT

Hello, every time that I run a delta live table materialized view in serverless , I get a log of "COMPLETE RECOMPUTE". I realised I was using current_timestamp as a column in my MV to identify rows which got updated in the last refresh. But that makes the MV refresh non-deterministic leading to a log of "COMPLETE RECOMPUTE" every time.

How do I perform incremental refresh and also be able to identify which rows changed as part of the refresh?

Re: Identify updated rows during incremental refresh in DLT Materialized Views

ck7007 — Wed, 03 Sep 2025 14:58:41 GMT

@ManoramTaparia The issue is that current_timestamp() makes your MV non-deterministic, forcing complete recomputes. Here's how to fix it:

Solution: Use the Source Table's Change Tracking

Option 1: Leverage Source Table's Timestamp Column
@Dlt.table(
name="my_materialized_view"
)
def my_mv():
return (
spark.readStream
.option("readChangeFeed", "true")
.table("source_table")
.select(
"",
col("_commit_timestamp"). alias("last_updated") # From CDF

Option 2: Track Changes with Watermarking
@Dlt.table(
name="incremental_mv"
)
def incremental_mv():
df = dlt.read("source_table")

# Add deterministic timestamp from source data
return df.select(
"",
col("modified_date"). alias("refresh_timestamp") # Use source column
).filter(
col("modified_date") > spark.conf.get("last_refresh_time", "1900-01-01")

Option 3: Use DLT Change Data Feed

@Dlt.table(
name="tracked_mv",
table_properties={
"delta.enableChangeDataFeed": "true"
}
)
def tracked_mv():
return dlt.read_stream("source_table")

# Query changes separately
changes_df = spark.read.format("delta") \
.option("readChangeFeed", "true") \
.option("startingVersion", last_version) \
.table("tracked_mv")

Key Points

Never use current_timestamp() in MV definitions
Use source table timestamps or change feeds for tracking
Enable CDF on source tables for proper change tracking

The CDF approach (Option 3) gives you both incremental refresh AND change identification without forcing recomputes.

Which source table timestamp columns are available?

topic Identify updated rows during incremental refresh in DLT Materialized Views in Data Engineering

Identify updated rows during incremental refresh in DLT Materialized Views

Re: Identify updated rows during incremental refresh in DLT Materialized Views

Solution: Use the Source Table's Change Tracking

Key Points