Hi โ this is a known limitation of Change Data Feed. Here's what's happening and your options.
Why This Happens
Changing a column from INT to DECIMAL is a non-additive schema change. When reading CDF in batch mode, Delta Lake applies a single schema (the latest or end-version schema) to all Parquet files in the version range. Since the older Parquet files still have INT and the schema expects DECIMAL, you get a conflict.
`mergeSchema` won't help here โ it handles additive changes like new columns, not data type changes.
Your Options
- Split your CDF reads at the schema change boundary (recommended if you want to avoid a full reload)
Read CDF in two separate ranges โ before and after the type change โ then cast and union:
# Read versions BEFORE the type change (e.g., up to version N-1)
df_before = (spark.read.format("delta")
.option("readChangeFeed", "true")
.option("startingVersion", start_version)
.option("endingVersion", schema_change_version - 1)
.table("your_table")
)
# Read versions AFTER the type change (version N onward)
df_after = (spark.read.format("delta")
.option("readChangeFeed", "true")
.option("startingVersion", schema_change_version)
.option("endingVersion", end_version)
.table("your_table")
)
# Cast the old schema to match and union
df_before_casted = df_before.withColumn("col_name", df_before["col_name"].cast("decimal"))
df_combined = df_before_casted.unionByName(df_after)
You can find the version where the schema changed using DESCRIBE HISTORY your_table.
- Full reload of the table
If splitting reads is too complex for your pipeline, a one-time full reload at the new schema is the simplest path. After the reload, future CDF reads will work normally since all files will have the new schema.
- Use Type Widening for future-proofing (DBR 15.4+)
The Type Widening feature lets you widen column types (e.g., INT โ DECIMAL) without rewriting data files. However, even with type widening, CDF reads across the type change boundary are still not supported โ you'd still need to split reads. The benefit is it avoids the costly full-table rewrite on the provider side.
Note: Type widening over Delta Sharing requires both provider and recipient on DBR 16.1+ and is only supported for Databricks-to-Databricks sharing.
TL;DR
You cannot read CDF across a data type change in a single query โ this is by design. Split your reads at the schema change version boundary, or do a full reload. For future schema changes, consider type widening to minimize disruption.
Docs:
Anuj Lathi
Solutions Engineer @ Databricks