Re: How to use change data feed when schema is cha...

lingareddy_Alva · ‎04-23-2025

When you read from the change data feed in batch mode, Delta Lake always uses a single schema:
By default, it uses the latest table version’s schema, even if you’re only reading an older version
On Delta Runtime ≥ 12.2 LTS with column mapping enabled, batch CDF reads instead use the end version’s schema, but still fail if your version range spans a non-additive schema change (e.g. drop/rename/type change)
Streaming reads (spark.readStream.option("readChangeFeed","true")) support schema evolution automatically, but batch reads do not.

This means CDF tried to apply the current/latest schema (v1375) when reading v1372, and detected that columns/types didn’t match

There is no built-in option in the Python batch API to switch to the start or end schema;
CDF’s batch path is fixed to use the latest (or, with mapping, the end) schema.

LR