cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Merge with schema evolution fails because of upper case columns

Dhruv-22
Contributor II

The following is a minimal reproducible example of what I'm facing right now.

%sql
CREATE OR REPLACE TABLE edw_nprd_aen.bronze.test_table (
  id INT
);

INSERT INTO edw_nprd_aen.bronze.test_table
VALUES (1);

SELECT *
FROM edw_nprd_aen.bronze.test_table

Dhruv22_0-1768233514715.png

df = spark.createDataFrame([[1, 2], [3, 4]], 'id int, nest_col int')
df.display()
df.createOrReplaceTempView('v_final')

Dhruv22_1-1768233551139.png

df_cols = set([col.upper() for col in df.columns])
table_cols = set([col.upper() for col in spark.table('edw_nprd_aen.bronze.test_table').columns])
all_cols = df_cols.union(table_cols)

merge_string = f"""
    MERGE WITH SCHEMA EVOLUTION INTO edw_nprd_aen.bronze.test_table AS target
    USING v_final AS source
    ON target.id = source.id
    WHEN MATCHED THEN UPDATE SET
    {'\n, '.join(f'{col} = source.{col}' if col in df_cols else f'{col} = NULL' for col in all_cols)}
"""
print(merge_string)
spark.sql(merge_string)

-- Output

MERGE WITH SCHEMA EVOLUTION INTO edw_nprd_aen.bronze.test_table AS target
USING v_final AS source
ON target.id = source.id
WHEN MATCHED THEN UPDATE SET
NEST_COL = source.NEST_COL
, ID = source.ID

[DELTA_CONFLICT_SET_COLUMN] There is a conflict from these SET columns: [`NEST_COL`, `nest_col`]. SQLSTATE: 42701
File <command-5916479188032645>, line 13
      5 merge_string = f"""
      6     MERGE WITH SCHEMA EVOLUTION INTO edw_nprd_aen.bronze.test_table AS target
      7     USING v_final AS source
   (...)
     10     {'\n, '.join(f'{col} = source.{col}' if col in df_cols else f'{col} = NULL' for col in all_cols)}
     11 """
     12 print(merge_string)
---> 13 spark.sql(merge_string)

The error is saying a conflict between the same two columns with different case.

It disappears when I modify the case of the columns of the dataframe

df = df.withColumnsRenamed({col: col.upper() for col in df.columns})
df.createOrReplaceTempView('v_final')

# Do merge and it works fine

 

None of my case sensitive options are on.

Dhruv22_0-1768234077162.png

Could someone explain what is happening here? I suspect it to be a bug

3 REPLIES 3

css-1029
New Contributor

Hi @Dhruv-22,

It's actually not a bug. Let me explain what's happening.

The Root Cause

The issue stems from how schema evolution works with Delta Lake's MERGE statement, combined with Spark SQL's case-insensitivity settings.

Here's the key insight: spark.sql.caseSensitive = false affects query resolution, but NOT schema evolution behavior.

When you run MERGE WITH SCHEMA EVOLUTION, Delta Lake needs to determine which columns to add to the target table. Here's what happens:

  1. Your source DataFrame has columns: id, nest_col (lowercase)
  2. Your MERGE statement references: NEST_COL, ID (uppercase, from your f-string)
  3. Delta sees NEST_COL in the SET clause but nest_col in the actual source schema
  4. Since schema evolution adds columns by exact name from the source, Delta tries to add nest_col (from DataFrame) while your SET clause references NEST_COL

Delta interprets this as two different columns being set—hence the conflict error.

Why Case Sensitivity Setting Doesn't Help

The spark.sql.caseSensitive = false setting controls how Spark resolves column references in queries. But during schema evolution, Delta Lake performs a literal schema merge where column names are matched exactly as they appear in the source DataFrame's schema.

So even though Spark treats NEST_COL and nest_col as equivalent for query purposes, Delta's schema evolution logic sees them as distinct when comparing your SET clause against the source schema.

Hi @css-1029 

Thanks for the response, I understand the mechanics behind what is happening. But, why is it not a bug? If the the casing is to be respected then it should be across all columns and merge statements. Why does it only fail when adding a new column in a particular type of merge statement (schema evolution)? It seems that the error is an unintended one.

Hi @Dhruv-22 ,
I totally agree with you. A configuration setting like spark.sql.caseSensitive should be a system-wide contract. If schema evolution silently ignores this setting and does literal string matching on column names, that's a violation of the expected behavior. The error message even references both NEST_COL and nest_col as if they're different columns—which contradicts what the case sensitivity setting promises.