- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-23-2025 10:31 AM
When you read from the change data feed in batch mode, Delta Lake always uses a single schema:
By default, it uses the latest table version’s schema, even if you’re only reading an older version
On Delta Runtime ≥ 12.2 LTS with column mapping enabled, batch CDF reads instead use the end version’s schema, but still fail if your version range spans a non-additive schema change (e.g. drop/rename/type change)
Streaming reads (spark.readStream.option("readChangeFeed","true")) support schema evolution automatically, but batch reads do not.
This means CDF tried to apply the current/latest schema (v1375) when reading v1372, and detected that columns/types didn’t match
There is no built-in option in the Python batch API to switch to the start or end schema;
CDF’s batch path is fixed to use the latest (or, with mapping, the end) schema.