cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Delta lake schema enforcement allows datatype mismatch on write using MERGE-operation [python]

signo
New Contributor II

Databricks Runtime: 12.2 LTS, Spark: 3.3.2, Delta Lake: 2.2.0

A target table with schema ([c1: integer, c2: integer]), allows us to write into target table using data with schema ([c1: integer, c2: double]). I expected it to throw an exception (same as it does using normal spark write INSERT operation), but instead it stored the data with mismatched datatype for field c2. 

from pyspark.sql.types import StructType, StructField, IntegerType, DoubleType
from delta import DeltaTable
  
# Source data
schema = StructType([StructField("c1", IntegerType(), False), StructField("c2", DoubleType(), False)])
rdd_output = spark.sparkContext.parallelize([(4, 1.4), (5, 5.0), (6, 3.5),])
df_source = spark.createDataFrame(rdd_output, schema=schema)
  
# write source to target table using merge
target_table = DeltaTable.forName(spark, "default.test_datatype_misalignment")
merge = target_table.alias("target").merge(df_source.alias("source"), "target.c1 = source.c1")          
merge.whenMatchedUpdateAll().whenNotMatchedInsertAll().execute()
spark.table("default.test_datatype_misalignment").show()
 
# OUTPUT
#+---+---+
#| c1| c2|
#+---+---+
#|  1|  1|
#|  2|  1|
#|  3|  5|
#|  4|  1|
#|  5|  5|
#|  6|  3|
#+---+---+
  
# write source to target table using insert
df_source.write.format("delta").mode("append").saveAsTable("default.test_datatype_misalignment")
 
# OUTPUT
#AnalysisException: Failed to merge fields 'c2' and 'c2'. Failed to merge incompatible data types IntegerType and DoubleType

I'am expecting an exception to be raised regardless of the write command, why is this not the case? 

2 REPLIES 2

-werners-
Esteemed Contributor III

perhaps schema evolution is enabled?

Anonymous
Not applicable

Hi @Sigrun Nordliโ€‹ 

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group