Databricks Community

BenLambert · ‎09-06-2022

I have a delta live tables pipeline that is loading and transforming data. Currently I am having a problem that the schema inferred by DLT does not match the actual schema of the table. The table is generated via a groupby.pivot operation as follows:

 gb = (
        df.groupBy(['unique_trip_id', 'signal', 'value'])
        .count()
    )
    
    gb = (
         gb.groupBy(['unique_trip_id','value'])
        .pivot("signal")
        .sum("count")
        .fillna(0)
    )

I get the following error message:

org.apache.spark.sql.AnalysisException: A schema mismatch detected when writing to the Delta table (Table ID: fdecc1fa-fadd-4779-bc43-d93d87c9cc9e).

To enable schema migration using DataFrameWriter or DataStreamWriter, please set:

'.option("mergeSchema", "true")'.

For other operations, set the session configuration

spark.databricks.delta.schema.autoMerge.enabled to "true". See the documentation

specific to the operation for details.

My question is how can I set this option in my notebook for delta live tables? Or am I doing something wrong that is causing the schema inference to fail?

Thanks for your help.