- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-06-2022 12:48 AM
I have a delta live tables pipeline that is loading and transforming data. Currently I am having a problem that the schema inferred by DLT does not match the actual schema of the table. The table is generated via a groupby.pivot operation as follows:
gb = (
df.groupBy(['unique_trip_id', 'signal', 'value'])
.count()
)
gb = (
gb.groupBy(['unique_trip_id','value'])
.pivot("signal")
.sum("count")
.fillna(0)
)
I get the following error message:
org.apache.spark.sql.AnalysisException: A schema mismatch detected when writing to the Delta table (Table ID: fdecc1fa-fadd-4779-bc43-d93d87c9cc9e).
To enable schema migration using DataFrameWriter or DataStreamWriter, please set:
'.option("mergeSchema", "true")'.
For other operations, set the session configuration
spark.databricks.delta.schema.autoMerge.enabled to "true". See the documentation
specific to the operation for details.
My question is how can I set this option in my notebook for delta live tables? Or am I doing something wrong that is causing the schema inference to fail?
Thanks for your help.
- Labels:
-
Delta
-
Delta Live Tables
-
DLT Pipeline
-
Schema
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-06-2022 01:44 AM
I was able to get around this by specifying the table schema in the table decorator.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-06-2022 01:44 AM
I was able to get around this by specifying the table schema in the table decorator.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-09-2022 04:08 PM
thank you for your reply. I will mark your response as best.

