Databricks Community

minhhung0507 · ‎03-28-2025

I'm encountering a puzzling schema merge issue with my Delta Live Table. My setup involves several master tables on Databricks, and due to a schema change in the source database, one of my Delta Live Tables has a column (e.g., "reference_score") that was originally an integer and has now been changed to a float.

After running the pipeline, I receive errors like:

com.databricks.pipelines.common.errors.DLTAnalysisException: [CANNOT_UPDATE_TABLE_SCHEMA] Failed to merge the current and new schemas for table ... To proceed with this schema change, you can trigger a full refresh of this table. Depending on your use case and the schema changes, you may be able to obviate the schema change -- you can update your queries so the output schema is compatible with the existing schema (e.g, by explicitly casting columns to the correct data type).
com.databricks.sql.transaction.tahoe.DeltaAnalysisException: [DELTA_FAILED_TO_MERGE_FIELDS] Failed to merge fields 'reference_score' and 'reference_score'
com.databricks.sql.transaction.tahoe.DeltaAnalysisException: [DELTA_MERGE_INCOMPATIBLE_DATATYPE] Failed to merge incompatible data types IntegerType and FloatType

Has anyone faced this issue? Is there a way to overwrite or merge the schema changes without triggering a full refresh? I'm curious if there are any workarounds or best practices to handle such a transformation scenario.

Thanks in advance!

Regards,
Hung Nguyen

Brahmareddy · ‎03-29-2025

Hey Hung, this is a pretty common issue when working with Delta Live Tables (DLT) and schema evolution. When the data type of a column changes—like in your case from Integer to Float—Delta sees that as an incompatible schema change, and by default, DLT doesn’t allow automatic merging of those types to avoid data quality issues. The cleanest solution is to explicitly cast the column in your DLT transformation code so the output matches the existing table schema. For example, you can cast reference_score to FLOAT using something like col("reference_score").cast("float"), which helps DLT see the schema as compatible. This avoids needing a full refresh. If casting doesn’t fit your case or if you're okay with refreshing the data, then you can also trigger a full refresh by deleting the target table and letting DLT recreate it with the new schema. But in most cases, a simple cast in your transformation logic is enough to resolve this smoothly. Let me know if you’d like help updating your DLT code!

Regards,

Brahma

minhhung0507 · ‎03-31-2025

Hi @Brahmareddy ,

Thanks for your input. I've already tried explicitly casting the column to FLOAT as suggested, but the error still persists, and I'm stuck with the same schema merge issue. I'm not too keen on triggering a full refresh on the entire table just yet, as I'd like to explore any alternatives first.

Have you come across any other workarounds or tweaks—perhaps some configuration change or a hidden trick—that could help resolve this without resorting to a full refresh? Any further insights or suggestions would be much appreciated!

Regards,
Hung Nguyen

Brahmareddy · ‎04-01-2025

Hey Hung, totally get it—not wanting to trigger a full refresh makes sense, especially if the table is large. Since casting to FLOAT didn’t solve it, one thing you can try is setting this config at the top of your DLT notebook: spark.conf.set("spark.databricks.delta.schema.autoMerge.enabled", "true"). This tells Delta to allow merging compatible data types like INT to FLOAT. Also, make sure the cast is actually happening before the data is returned in your DLT function, so the final schema matches what you want. If you're using schema hints or expectations, double-check that those also match the FLOAT type. And if nothing else works, you could try just dropping the target table manually—DLT will recreate it fresh next time without needing to reset the entire pipeline. Let me know how it goes—happy to help figure it out further!

Regards,

Brahma

minhhung0507 · ‎04-01-2025

Hi Brahma,

Thanks for the suggestion. I tried setting spark.conf.set("spark.databricks.delta.schema.autoMerge.enabled", "true") at the top of my DLT notebook as you recommended, but unfortunately, I'm still encountering the same error.

Just to clarify, my original table had a column named reference_score as INT, and it was later changed to FLOAT. Based on that, I assumed I might need to cast it back to INT, but that approach doesn't seem to work either.

I agree that dropping the target table would definitely resolve the issue, but before I take that step, I'm really curious why a schema change like this can't be handled more seamlessly.

Regards,
Hung Nguyen

Brahmareddy · ‎04-02-2025

Hey Hung, How are you doing today? this kind of schema change feels like it should be handled smoothly, especially when going from INT to FLOAT, which is usually a safe and upward-compatible change. But in DLT, things are a bit stricter because it tracks schema history closely for reliability. Even with autoMerge.enabled, Delta Live Tables won’t auto-merge incompatible types like INT and FLOAT unless everything lines up perfectly—including any schema hints, expectations, and previous metadata. Sometimes even a tiny mismatch in how the schema is inferred or cast can cause this conflict to persist. If casting to FLOAT didn’t help, and casting back to INT doesn’t work either, it's likely that the existing Delta metadata is locked into expecting INT, and DLT is refusing to merge to avoid potential data issues. Dropping the target table is usually the cleanest way out when this happens—DLT will rebuild it with the new schema. I know it’s not ideal, but it’s often the quickest fix. Let me know if you want help scripting that out safely!

Regards,

Brahma

Brahmareddy · ‎04-02-2025

Hey Hung, before dropping the table, there are a few other things you could try. One option is to point your DLT logic to a new table with the updated FLOAT column, just to confirm everything works with the new schema—this helps isolate whether the issue is with the existing table’s metadata. You could also try adding .option("mergeSchema", "true") to your writeStream if you’re using custom logic (though in DLT it's sometimes limited). Another thing to check is whether any expectations or schema hints are applied—if so, try removing or updating them to match the new FLOAT type. You can even add a schema_hints argument in the @Dlt.table decorator to force it to recognize reference_score as a FLOAT. Sometimes these little tweaks can help DLT adjust without a full reset. Let me know if you want help testing one of these approaches!

Regards,

Brahma

minhhung0507 · ‎04-03-2025

Dear Brahma,

Thank you very much for your detailed explanation and guidance. I truly appreciate the insights and suggestions you provided. I'll definitely try pointing the DLT logic to a new table with the updated FLOAT column, consider adding the .option("mergeSchema", "true") where applicable, and review any schema hints or expectations first. I want to exhaust these options before resorting to dropping the target table. Your help has been invaluable, and I might reach out if I need further assistance with scripting these adjustments safely.

Regards,
Hung Nguyen

Brahmareddy · ‎04-03-2025

Dear Hung,

Thank you so much for the kind words—I’m really glad the suggestions were helpful! You're absolutely doing the right thing by trying those options first before going for a full table drop. Testing with a new table and checking schema hints or expectations often reveals where things are getting stuck. And yes, feel free to reach out anytime if you need help scripting the changes or just want a second pair of eyes on your setup. Happy to support however I can—good luck, and hope it works out smoothly!

Warm regards,
Brahma

Databricks Community

CANNOT_UPDATE_TABLE_SCHEMA

Join Us as a Local Community Builder!

Solution Accelerator Series | #5 - Automating Product Review Summarization with LLMs

The next BrickTalks about the latest and greatest in AI/BI is scheduled for Oct 28!

🚀 Weekly Delta (8 - 14 October): A Look Back at This Week’s Top Community Highlights

BrickCon 2025 — Dec 3–5 | A Community Conference for Databricks Builders

🌟 Community Sparks of the Week | September 26 – October 2 🌟