cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Schema Evolution Issue in Streaming

Sandy21
New Contributor III

When there is a schema change while reading and writing to a stream, will the schema changes be automatically handled by spark

or do we need to include the option(mergeschema=True)?

Eg:

df.writeStream

 .option("mergeSchema", "true")

 .format("delta")

 .outputMode("append")

 .option("path","/data/")

 .option("checkpointLocation","/checkpoint/")

 .start()

 .awaitTermination()

  

Including the option(mergeschema=True) still throws the error :

ERROR: A schema mismatch detected when writing to the Delta table

To enable schema migration, please set:

'.option("mergeSchema", "true")'.

Do any additional option /changes needs to be done to the above query? Could you please advise to resolve this issue?

2 REPLIES 2

Hubert-Dudek
Esteemed Contributor III

mergeSchema doesn't support all operations. In some cases .option("overwriteSchema", "true") is needed. MergeSchema doesn't support:

  • Dropping a column
  • Changing an existing column's data type (in place)
  • Renaming column names that differ only by case (e.g., “Foo” and “foo”)

more on that topic here https://www.databricks.com/blog/2019/09/24/diving-into-delta-lake-schema-enforcement-evolution.html

Sandy21
New Contributor III

Thanks @Hubert Dudek​ . In the case of writing to a streaming table, do we need to change the checkpoint location as well in addition to adding the option("mergeSchema", "true")

if there is an addition of a new column in the incoming data?