Schema Evolution Issue in Streaming
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-03-2022 03:17 AM
When there is a schema change while reading and writing to a stream, will the schema changes be automatically handled by spark
or do we need to include the option(mergeschema=True)?
Eg:
df.writeStream
.option("mergeSchema", "true")
.format("delta")
.outputMode("append")
.option("path","/data/")
.option("checkpointLocation","/checkpoint/")
.start()
.awaitTermination()
Including the option(mergeschema=True) still throws the error :
ERROR: A schema mismatch detected when writing to the Delta table
To enable schema migration, please set:
'.option("mergeSchema", "true")'.
Do any additional option /changes needs to be done to the above query? Could you please advise to resolve this issue?
- Labels:
-
Delta
-
Schema Evolution Issue
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-03-2022 03:30 AM
mergeSchema doesn't support all operations. In some cases .option("overwriteSchema", "true") is needed. MergeSchema doesn't support:
- Dropping a column
- Changing an existing column's data type (in place)
- Renaming column names that differ only by case (e.g., “Foo” and “foo”)
more on that topic here https://www.databricks.com/blog/2019/09/24/diving-into-delta-lake-schema-enforcement-evolution.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-03-2022 11:13 PM
Thanks @Hubert Dudek . In the case of writing to a streaming table, do we need to change the checkpoint location as well in addition to adding the option("mergeSchema", "true")
if there is an addition of a new column in the incoming data?

