<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Merge operation to delta table with new column starting with upper case seems to be not working in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/merge-operation-to-delta-table-with-new-column-starting-with/m-p/105152#M42016</link>
    <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I have a simple spark dataframe saved to a delta table:&lt;/P&gt;&lt;PRE&gt;data = [&lt;BR /&gt;    (1, "John", "Doe"),&lt;BR /&gt;    (2, "Jane", "Smith"),&lt;BR /&gt;    (3, "Mike", "Johnson"),&lt;BR /&gt;    (4, "Emily", "Davis")&lt;BR /&gt;]&lt;BR /&gt;columns = ["Id", "First_name", "Last_name"]&lt;BR /&gt;df = spark.createDataFrame(data, schema=columns)&lt;BR /&gt;&lt;BR /&gt;df.write.format('delta').mode('overwrite') \&lt;BR /&gt;    .option('delta.columnMapping.mode', 'name') \&lt;BR /&gt;    .save(delta_path)&lt;/PRE&gt;&lt;P&gt;I want to merge another dataframe to the delta table, containing a new column 'Age'. I have schema evolution enabled, so I would expect the new column to appear in the delta table, but it doesn't.&lt;/P&gt;&lt;PRE&gt;data = [&lt;BR /&gt;    (1, "John2", "Doe2", 25),&lt;BR /&gt;    (2, "Jane2", "Smith2", 30),&lt;BR /&gt;    (30, "Mike2", "Johnson2", 35),&lt;BR /&gt;    (4, "Emily2", "Davis2", 40)&lt;BR /&gt;]&lt;BR /&gt;columns = ["Id", "First_name", "Last_name", "Age"]&lt;BR /&gt;df = spark.createDataFrame(data, schema=columns)&lt;BR /&gt;&lt;BR /&gt;spark.conf.set('spark.databricks.delta.schema.autoMerge.enabled', 'true')&lt;BR /&gt;&lt;BR /&gt;dt = DeltaTable.forPath(spark, delta_path)&lt;BR /&gt;&lt;BR /&gt;dt.alias('existing') \&lt;BR /&gt;    .merge(df.alias('updates'), f"existing.Id = updates.Id") \&lt;BR /&gt;    .whenMatchedUpdate(&lt;BR /&gt;    set = {&lt;BR /&gt;        'Last_name': 'updates.Last_name'&lt;BR /&gt;    }) \&lt;BR /&gt;    .whenNotMatchedInsert(values = {c: f"updates.{c}" for c in columns}) \&lt;BR /&gt;    .execute()&lt;/PRE&gt;&lt;P&gt;If I change the new columns casing to 'age' then it is added to the delta table.&lt;/P&gt;&lt;P&gt;Am I doing something wrong? Does a column name starting with uppercase has any special meaning?&lt;/P&gt;</description>
    <pubDate>Fri, 10 Jan 2025 11:12:44 GMT</pubDate>
    <dc:creator>alpar</dc:creator>
    <dc:date>2025-01-10T11:12:44Z</dc:date>
    <item>
      <title>Merge operation to delta table with new column starting with upper case seems to be not working</title>
      <link>https://community.databricks.com/t5/data-engineering/merge-operation-to-delta-table-with-new-column-starting-with/m-p/105152#M42016</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I have a simple spark dataframe saved to a delta table:&lt;/P&gt;&lt;PRE&gt;data = [&lt;BR /&gt;    (1, "John", "Doe"),&lt;BR /&gt;    (2, "Jane", "Smith"),&lt;BR /&gt;    (3, "Mike", "Johnson"),&lt;BR /&gt;    (4, "Emily", "Davis")&lt;BR /&gt;]&lt;BR /&gt;columns = ["Id", "First_name", "Last_name"]&lt;BR /&gt;df = spark.createDataFrame(data, schema=columns)&lt;BR /&gt;&lt;BR /&gt;df.write.format('delta').mode('overwrite') \&lt;BR /&gt;    .option('delta.columnMapping.mode', 'name') \&lt;BR /&gt;    .save(delta_path)&lt;/PRE&gt;&lt;P&gt;I want to merge another dataframe to the delta table, containing a new column 'Age'. I have schema evolution enabled, so I would expect the new column to appear in the delta table, but it doesn't.&lt;/P&gt;&lt;PRE&gt;data = [&lt;BR /&gt;    (1, "John2", "Doe2", 25),&lt;BR /&gt;    (2, "Jane2", "Smith2", 30),&lt;BR /&gt;    (30, "Mike2", "Johnson2", 35),&lt;BR /&gt;    (4, "Emily2", "Davis2", 40)&lt;BR /&gt;]&lt;BR /&gt;columns = ["Id", "First_name", "Last_name", "Age"]&lt;BR /&gt;df = spark.createDataFrame(data, schema=columns)&lt;BR /&gt;&lt;BR /&gt;spark.conf.set('spark.databricks.delta.schema.autoMerge.enabled', 'true')&lt;BR /&gt;&lt;BR /&gt;dt = DeltaTable.forPath(spark, delta_path)&lt;BR /&gt;&lt;BR /&gt;dt.alias('existing') \&lt;BR /&gt;    .merge(df.alias('updates'), f"existing.Id = updates.Id") \&lt;BR /&gt;    .whenMatchedUpdate(&lt;BR /&gt;    set = {&lt;BR /&gt;        'Last_name': 'updates.Last_name'&lt;BR /&gt;    }) \&lt;BR /&gt;    .whenNotMatchedInsert(values = {c: f"updates.{c}" for c in columns}) \&lt;BR /&gt;    .execute()&lt;/PRE&gt;&lt;P&gt;If I change the new columns casing to 'age' then it is added to the delta table.&lt;/P&gt;&lt;P&gt;Am I doing something wrong? Does a column name starting with uppercase has any special meaning?&lt;/P&gt;</description>
      <pubDate>Fri, 10 Jan 2025 11:12:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/merge-operation-to-delta-table-with-new-column-starting-with/m-p/105152#M42016</guid>
      <dc:creator>alpar</dc:creator>
      <dc:date>2025-01-10T11:12:44Z</dc:date>
    </item>
    <item>
      <title>Re: Merge operation to delta table with new column starting with upper case seems to be not working</title>
      <link>https://community.databricks.com/t5/data-engineering/merge-operation-to-delta-table-with-new-column-starting-with/m-p/105160#M42020</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/142211"&gt;@alpar&lt;/a&gt;,&lt;/P&gt;
&lt;P class="p1"&gt;The issue you’re experiencing is due to the way Delta Lake handles column names during schema evolution, especially when column mapping is enabled.&lt;/P&gt;
&lt;P class="p1"&gt;Here are the key points to understand:&lt;/P&gt;
&lt;P class="p1"&gt;Delta Lake schema evolution is case-insensitive by default.&lt;/P&gt;
&lt;P class="p1"&gt;When you enable column mapping with `delta.columnMapping.mode = 'name'`, it allows for more flexible column naming, including the use of special characters and spaces.&lt;/P&gt;
&lt;P class="p1"&gt;However, when schema evolution is performed, Delta Lake internally stores column names in lowercase.&lt;/P&gt;
&lt;P class="p1"&gt;In your case, the new column ‘Age’ is not being added because Delta Lake is treating it as equivalent to ‘age’ (which doesn’t exist in the original schema). When you change it to lowercase ‘age’, it’s recognized as a new column and added successfully.&lt;/P&gt;</description>
      <pubDate>Fri, 10 Jan 2025 12:31:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/merge-operation-to-delta-table-with-new-column-starting-with/m-p/105160#M42020</guid>
      <dc:creator>Alberto_Umana</dc:creator>
      <dc:date>2025-01-10T12:31:38Z</dc:date>
    </item>
    <item>
      <title>Re: Merge operation to delta table with new column starting with upper case seems to be not working</title>
      <link>https://community.databricks.com/t5/data-engineering/merge-operation-to-delta-table-with-new-column-starting-with/m-p/105168#M42024</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/142211"&gt;@alpar&lt;/a&gt;,&amp;nbsp;&lt;SPAN&gt;the code below is working perfectly for me. You might want to check the DBR version you’re using. I’m running it on version 14.3 LTS without any errors.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;# Create New Delta table with columnMapping.mode='name'
data = [
    (1, "John", "Doe"),
    (2, "Jane", "Smith"),
    (3, "Mike", "Johnson"),
    (4, "Emily", "Davis")
]
columns = ["Id", "First_name", "Last_name"]
df = spark.createDataFrame(data, schema=columns)

df.write.format('delta').mode('overwrite') \
    .option('delta.columnMapping.mode', 'name') \
    .save(delta_path)

# Update table with Age column
data = [
    (1, "John2", "Doe2", 25),
    (2, "Jane2", "Smith2", 30),
    (30, "Mike2", "Johnson2", 35),
    (4, "Emily2", "Davis2", 40)
]
columns = ["Id", "First_name", "Last_name", "Age"]
df = spark.createDataFrame(data, schema=columns)

spark.conf.set('spark.databricks.delta.schema.autoMerge.enabled', 'true')

dt = DeltaTable.forPath(spark, delta_path)

dt.alias('existing') \
    .merge(df.alias('updates'), f"existing.Id = updates.Id") \
    .whenMatchedUpdate(
    set = {
        'Last_name': 'updates.Last_name'
    }) \
    .whenNotMatchedInsert(values = {c: f"updates.{c}" for c in columns}) \
    .execute()&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 10 Jan 2025 12:50:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/merge-operation-to-delta-table-with-new-column-starting-with/m-p/105168#M42024</guid>
      <dc:creator>hari-prasad</dc:creator>
      <dc:date>2025-01-10T12:50:51Z</dc:date>
    </item>
    <item>
      <title>Re: Merge operation to delta table with new column starting with upper case seems to be not working</title>
      <link>https://community.databricks.com/t5/data-engineering/merge-operation-to-delta-table-with-new-column-starting-with/m-p/105178#M42028</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/106294"&gt;@Alberto_Umana&lt;/a&gt;&amp;nbsp;Columnmapping doesn’t seem to affect this behaviour, it’s not working even if columnMapping is not set.&lt;BR /&gt;If it’s really the case that you cannot add a new column containing any uppercase letters, I think it would be worth at least mentioning in the docs. Otherwise it can really cause confusion, as it did for me, since both spark dataframe and delta table column names are case-sensitive, so I expected that I can merge a new column with uppercase letters.&lt;/P&gt;&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/98469"&gt;@hari-prasad&lt;/a&gt;&amp;nbsp;Actually, I’m using Spark 3.4 with Delta Lake 2.4. It is not throwing an error (I would prefer if it did), it’s just the new ‘Age’ column is not added to the delta table. Is this something that’s fixed in newer versions of Delta Lake?&lt;/P&gt;&lt;P&gt;&lt;!--  notionvc: 7d93460e-5df2-4830-aedc-3f0de5afb46c  --&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 10 Jan 2025 13:42:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/merge-operation-to-delta-table-with-new-column-starting-with/m-p/105178#M42028</guid>
      <dc:creator>alpar</dc:creator>
      <dc:date>2025-01-10T13:42:20Z</dc:date>
    </item>
    <item>
      <title>Re: Merge operation to delta table with new column starting with upper case seems to be not working</title>
      <link>https://community.databricks.com/t5/data-engineering/merge-operation-to-delta-table-with-new-column-starting-with/m-p/105184#M42032</link>
      <description>&lt;P&gt;I assume you must be facing an error referred here on GitHub issues page. you can follow it, they make release fix for same.&lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/delta-io/delta/issues/3336" target="_blank"&gt;[BUG][Spark] issue when merge using autoMerge property · Issue #3336 · delta-io/delta · GitHub&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 10 Jan 2025 14:12:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/merge-operation-to-delta-table-with-new-column-starting-with/m-p/105184#M42032</guid>
      <dc:creator>hari-prasad</dc:creator>
      <dc:date>2025-01-10T14:12:16Z</dc:date>
    </item>
  </channel>
</rss>

