I have a delta live tables pipeline that is loading and transforming data. Currently I am having a problem that the schema inferred by DLT does not match the actual schema of the table. The table is generated via a groupby.pivot operation as follows:
gb = (
df.groupBy(['unique_trip_id', 'signal', 'value'])
.count()
)
gb = (
gb.groupBy(['unique_trip_id','value'])
.pivot("signal")
.sum("count")
.fillna(0)
)
I get the following error message:
org.apache.spark.sql.AnalysisException: A schema mismatch detected when writing to the Delta table (Table ID: fdecc1fa-fadd-4779-bc43-d93d87c9cc9e).
To enable schema migration using DataFrameWriter or DataStreamWriter, please set:
'.option("mergeSchema", "true")'.
For other operations, set the session configuration
spark.databricks.delta.schema.autoMerge.enabled to "true". See the documentation
specific to the operation for details.
My question is how can I set this option in my notebook for delta live tables? Or am I doing something wrong that is causing the schema inference to fail?
Thanks for your help.