Databricks Community

BenLambert · ‎09-06-2022

I have a delta live tables pipeline that is loading and transforming data. Currently I am having a problem that the schema inferred by DLT does not match the actual schema of the table. The table is generated via a groupby.pivot operation as follows:

 gb = (
        df.groupBy(['unique_trip_id', 'signal', 'value'])
        .count()
    )
    
    gb = (
         gb.groupBy(['unique_trip_id','value'])
        .pivot("signal")
        .sum("count")
        .fillna(0)
    )

I get the following error message:

org.apache.spark.sql.AnalysisException: A schema mismatch detected when writing to the Delta table (Table ID: fdecc1fa-fadd-4779-bc43-d93d87c9cc9e).

To enable schema migration using DataFrameWriter or DataStreamWriter, please set:

'.option("mergeSchema", "true")'.

For other operations, set the session configuration

spark.databricks.delta.schema.autoMerge.enabled to "true". See the documentation

specific to the operation for details.

My question is how can I set this option in my notebook for delta live tables? Or am I doing something wrong that is causing the schema inference to fail?

Thanks for your help.

BenLambert · ‎09-06-2022

I was able to get around this by specifying the table schema in the table decorator.

View solution in original post

BenLambert · ‎09-06-2022

I was able to get around this by specifying the table schema in the table decorator.

jose_gonzalez · ‎09-09-2022

thank you for your reply. I will mark your response as best.

Databricks Community

Delta Live Tables not inferring table schema properly.

Connect with Databricks Users in Your Area

Databricks Named a Leader in the 2024 Gartner® Magic Quadrant™ for Cloud Database Management Systems

Announcing the new Meta Llama 3.3 model on Databricks

Milestone: DatabricksTV Reaches 100 Videos!

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences

Databricks Community Champion - December 2024 - Sujesh Menon