Schema Evolution and Schema Enforcement without Delta live Tables & Unity catalog
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
5 hours ago
In Delta Lake, schema evolution with mergeSchema handles column additions perfectly — new columns get added and old rows get NULL. But when there is a data type change in the incoming data (for example, a column that was INT now coming as STRING from the source), mergeSchema throws an error even in append mode. However, in formats like ORC, Avro, and Parquet, mergeSchema handles both column additions and data type changes without any issue. So my question is — is this data type restriction in Delta's mergeSchema a design decision to protect existing data integrity, or is there a way to handle data type changes in append mode without resorting to casting before the write or doing a full overwrite? Also, in a production pipeline where the source schema keeps changing dynamically and we cannot hardcode the schema, what is the recommended approach to handle data type changes gracefully without breaking the pipeline?
#deltalake #Schema