Hello,
I have a simple spark dataframe saved to a delta table:
data = [
(1, "John", "Doe"),
(2, "Jane", "Smith"),
(3, "Mike", "Johnson"),
(4, "Emily", "Davis")
]
columns = ["Id", "First_name", "Last_name"]
df = spark.createDataFrame(data, schema=columns)
df.write.format('delta').mode('overwrite') \
.option('delta.columnMapping.mode', 'name') \
.save(delta_path)
I want to merge another dataframe to the delta table, containing a new column 'Age'. I have schema evolution enabled, so I would expect the new column to appear in the delta table, but it doesn't.
data = [
(1, "John2", "Doe2", 25),
(2, "Jane2", "Smith2", 30),
(30, "Mike2", "Johnson2", 35),
(4, "Emily2", "Davis2", 40)
]
columns = ["Id", "First_name", "Last_name", "Age"]
df = spark.createDataFrame(data, schema=columns)
spark.conf.set('spark.databricks.delta.schema.autoMerge.enabled', 'true')
dt = DeltaTable.forPath(spark, delta_path)
dt.alias('existing') \
.merge(df.alias('updates'), f"existing.Id = updates.Id") \
.whenMatchedUpdate(
set = {
'Last_name': 'updates.Last_name'
}) \
.whenNotMatchedInsert(values = {c: f"updates.{c}" for c in columns}) \
.execute()
If I change the new columns casing to 'age' then it is added to the delta table.
Am I doing something wrong? Does a column name starting with uppercase has any special meaning?