cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Can't mergeSchema handle int and bigint?

Dhruv-22
Contributor

I have a table which has a column of data type 'bigint'. While overwriting it with new data, given that I do full loads, I used 'mergeSchema' to handle schema changes. The new data's datatype was int. I thought mergeSchema can easily handle that, but it throws an error.

crm_retail_df.write.mode("overwrite").format("delta").option('mergeSchema', True).saveAsTable(silver_table)

-- Output
[DELTA_FAILED_TO_MERGE_FIELDS] Failed to merge fields 'Lob_Pk' and 'Lob_Pk' SQLSTATE: 22005

 Shouldn't 'mergeSchema' automatically widen the incoming from int to bigint? If not, then does mergeSchema only handle cases when datatypes match between new and old data?

1 ACCEPTED SOLUTION

Accepted Solutions

K_Anudeep
Databricks Employee
Databricks Employee

Hello @Dhruv-22 ,

No—mergeSchema doesn’t auto-widen an incoming INT column to a table’s BIGINT (nor does it auto-cast). mergeSchema mainly helps add new columns (and historically only a tiny set of numeric upcasts), but it won’t change an existing column’s type or cast your DataFrame to match the table. That’s why you see [DELTA_FAILED_TO_MERGE_FIELDS]

Doc Link: https://www.databricks.com/blog/2019/09/24/diving-into-delta-lake-schema-enforcement-evolution.html

However, you can rewrite the schema and data using 

df_with_new_types
  .write.format("delta")
  .mode("overwrite")
  .option("overwriteSchema", "true")
  .saveAsTable(table_name))

If you are using DBR 15.4 and above, you can try enabling type_widening

Doc Link : https://docs.databricks.com/aws/en/delta/type-widening#enable-type-widening

Anudeep

View solution in original post

2 REPLIES 2

K_Anudeep
Databricks Employee
Databricks Employee

Hello @Dhruv-22 ,

No—mergeSchema doesn’t auto-widen an incoming INT column to a table’s BIGINT (nor does it auto-cast). mergeSchema mainly helps add new columns (and historically only a tiny set of numeric upcasts), but it won’t change an existing column’s type or cast your DataFrame to match the table. That’s why you see [DELTA_FAILED_TO_MERGE_FIELDS]

Doc Link: https://www.databricks.com/blog/2019/09/24/diving-into-delta-lake-schema-enforcement-evolution.html

However, you can rewrite the schema and data using 

df_with_new_types
  .write.format("delta")
  .mode("overwrite")
  .option("overwriteSchema", "true")
  .saveAsTable(table_name))

If you are using DBR 15.4 and above, you can try enabling type_widening

Doc Link : https://docs.databricks.com/aws/en/delta/type-widening#enable-type-widening

Anudeep

Chiran-Gajula
New Contributor

Hi Dhruv,

Delta won't automatically upcast unless you explicitly handle it. Cast the column Lob_Pk to LongType (which maps to BIGINT in SQL/Delta). Try below snippet

from pyspark.sql.functions import col
from pyspark.sql.types import LongType

crm_retail_df = crm_retail_df.withColumn("Lob_Pk", col("Lob_Pk").cast(LongType()))
crm_retail_df.write \
.mode("overwrite") \
.format("delta") \
.option("mergeSchema", "true") \
.saveAsTable(silver_table)

G.Chiranjeevi