cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

withColumnRenamed error on Unity Catalog 14.3 LTS

ksamborn
New Contributor II

Hi - 

 

We are migrating to Unity Catalog 14.3 LTS and have seen a change in behavior using withColumnRenamed.

There is an error COLUMN_ALREADY_EXISTS on the join key, even though the column being renamed is a different column.   The joined DataFrame does behave fine before the renaming.   

We use the join syntax using a join expression like this:

 

 

 

df_1 = spark.createDataFrame([(1, "a", 1)], ["key", "field1", "field2"])
df_2 = spark.createDataFrame([(1, "test", 1000)], ["key", "data1", "data2"])

# this fails with UC and works find with non-UC

join_df = df_1.join(df_2, df_1.key == df_2.key)
#join_df = join_df.drop(df_2.key)

rename_df = join_df.withColumnRenamed("data1", "rename_data1")

join_df.display()
rename_df.display()

 

 

We receive this error:

[COLUMN_ALREADY_EXISTS] The column `key` already exists. Consider to choose another name or rename the existing column. SQLSTATE: 42711

A few other observations:

- If we use syntax that specifies a string for the column name the behavior is fine, join_df2 = df_1.join(df_2, "key")

- if we uncomment the drop line above it's fine

- aliasing the two tables before the join doesn't work

- it works works fine in 13.3 LTS and appeared in 14.0 and later

 

Is there something we are doing wrong or should be doing differently?

 

Thanks for your help!

 

2 REPLIES 2

Palash01
Valued Contributor

Hey @ksamborn 

I can think of 2 solutions:

  1. Rename the column in df_2 before joining:

 

df_1_alias = df_1.alias("t1")
df_2_alias = df_2.alias("t2")
join_df = df_1_alias.join(df_2_alias, df_1_alias.key == df_2_alias.key)
rename_df = join_df.withColumnRenamed("t2.data1", "rename_data1")

 

 2. Use aliases for the join tables before the join:

 

df_1_alias = df_1.alias("t1")
df_2_alias = df_2.alias("t2")
join_df = df_1_alias.join(df_2_alias, df_1_alias.key == df_2_alias.key)
rename_df = join_df.withColumnRenamed("t2.data1", "rename_data1")

 

Let us know if this works otherwise followups are appreciated. 

 

Leave a like if this helps! Kudos,
Palash

ksamborn
New Contributor II

Hi Palash

Thanks for your reply.

In short, renaming the column before the join and before the column rename works (as does selecting only the columns that aren't duplicate keys).

The aliasing does not work.   In fact, I've used the standard example from the Pyspark documentation and withColumnRenamed() doesn't work when I add the last two lines here:

from pyspark.sql.functions import col, desc
df = spark.createDataFrame(
    [(14, "Tom"), (23, "Alice"), (16, "Bob")], ["age", "name"])
df_as1 = df.alias("df_as1")
df_as2 = df.alias("df_as2")
joined_df = df_as1.join(df_as2, col("df_as1.name") == col("df_as2.name"), 'inner')

joined_df.select(
    "df_as1.name", "df_as2.name", "df_as2.age").sort(desc("df_as1.name")).show()

renamed_df = joined_df.withColumnRenamed("df_as1.age", "age_renamed")
renamed_df.show()

 

I get exceptions for both of the last two lines when running in a Azure Databricks 14.3 LTS notebook.

For what it’s worth, I did ask an AI what is going on here and received a response that the behavior of `withColumnRenamed()` changes when Unity Catalog is enabled because of data lineage tracking:

ksamborn_0-1708957061285.png

 

However I cannot find any official references to this.  (And this isn't exactly what we are experiencing.)

Does anyone know anything about this?   Thanks for your help.

Kevin

 

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group