Databricks Community

thushar · ‎03-08-2023

Have one function to create files with partitions, in that the partitions are created based on metadata (getPartitionColumns) that we are keeping. In a table we have two columns that are mentioned as partition columns, say 'Team' and 'Speciality'.

While executing, partition columns are not substituted properly within the datafrme's write method and getting an error like below

AnalysisException: Partition column `"Team","Speciality"` not found in schema

But these columns are already there in the data frame. Any idea how to resolve this?

Seems like the value `"Team","Speciality" is considered as single column instead of separate columns.

def dfWrite(df, targetPath,tableName):

partitionColumn = getPartitionColumns(tableName)

# "Team", "Speciality"

df.write.option("header", True) \

.partitionBy(partitionColumn) \

.mode("overwrite") \

.csv(targetPath)

pvignesh92 · ‎03-09-2023

Hi Thushar,

You have not mentioned the return type of the getPartitionColumns method. You have to return the partition columns as collection Ex list ['Team', 'Speciality']

Then the below method should work.

df.write.option("header", True) \

.partitionBy(*partitionColumn) \

.mode("overwrite") \

.csv(targetPath)

Kindly try.

thushar · ‎03-09-2023

Hi Vignesh,

Thanks, the return type was a string, and converted that to a tuple and it is working.

pvignesh92 · ‎03-09-2023

Hi Thushar,

Please upvote and mark this as answer so that the thread will be closed

Anonymous · ‎03-31-2023

Hi @Thushar R

Hope everything is going great.

Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can help you.

Cheers!