Databricks Community

AsfandQ · ‎06-18-2022

Hello,

I am trying to write Delta files for some CSV data. When I do

csv_dataframe.write.format("delta").save("/path/to/table.delta")

I get: AnalysisException:

Found invalid character(s) among " ,;{}()\n\t=" in the column names of your

schema.

Having looked up some docs, I expected the following to set the column mapping mode to "name" for all tables which would not cause this error:

spark.conf.set("spark.databricks.delta.defaults.columnMapping.mode", "name")

Running this before invoking `write(...)` does not work and I get the same error.

I have managed to do it in SQL using TBLPROPERTIES with the CREATE TABLE statement like so:

CREATE TABLE table_bronze_csv
  USING CSV
  OPTIONS (path '/path/to/data.csv', 'header' 'true', 'mode' 'FAILFAST');
 
CREATE TABLE table_bronze
  USING DELTA
  TBLPROPERTIES ("delta.columnMapping.mode" = "name")
  AS SELECT * FROM table_bronze;

but am looking for the Python way of doing it.

Thanks

Hemant · ‎06-18-2022

According to my understanding, it's property of delta table not of a delta file.Thats why, it's not worked while you save it as delta file.

Hemant Soni

View solution in original post

Hemant · ‎06-18-2022

Just pass properties in option:

csv_dataframe.write.format("delta").option("delta.columnMapping.mode","name").save(path)

Hemant Soni

AsfandQ · ‎06-18-2022

Thanks @Hemant Kumar, I tried exactly what you said, the below is a copy/paste (with sanitized names):

table_bronze.write.format("delta").option("delta.columnMapping.mode", "name").save("/path/to/table.delta")

I got the same error. Is there a bug?

Hemant · ‎06-18-2022

Sorry, I forgot to mention saveastable , try this:

table_bronze.write.format("delta").option("delta.columnMapping.mode", "name").option("path","/path/to/table_bronze").saveAsTable("table_bronze")

Hemant Soni

AsfandQ · ‎06-18-2022

Thanks that worked. I can now query it with SQL. Can you explain why is it that I have to do saveAsTable with path set in an option? I thought calling `save()` was the way to do this kind of operation.

Hemant · ‎06-18-2022

According to my understanding, it's property of delta table not of a delta file.Thats why, it's not worked while you save it as delta file.

Hemant Soni

Keng_Onn · ‎04-27-2023

I was able to save it as a delta file. You need to specify the minReaderVersion and minWriterVersion as well. E.g.

spark_df.write.format("delta").mode('overwrite').option("delta.columnMapping.mode", "name").option('delta.minReaderVersion', '2').option('delta.minWriterVersion', '5').save('/path/to/table')

Reference: Table protocol versioning — Delta Lake Documentation

Personal1 · ‎10-01-2024

I still get the error when I try any method. The column names with spaces are throwing error

[DELTA_INVALID_CHARACTERS_IN_COLUMN_NAMES] Found invalid character(s) among ' ,;{}()\n\t=' in the column names of your schema.

df1.write.format("delta") \
.mode("overwrite") \
.option("mergeSchema", 'true') \
.option("delta.columnMapping.mode", "name") \
.option('delta.minReaderVersion', '2') \
.option('delta.minWriterVersion', '5') \
.option("path","/tmp/spark-delta-table") \
.saveAsTable("`spark-delta-table`")

Or

df1.write.format("delta") \
.mode("overwrite") \
.option("mergeSchema", 'true') \
.option("delta.columnMapping.mode", "name") \
.option('delta.minReaderVersion', '2') \
.option('delta.minWriterVersion', '5') \
.save("/tmp/spark-delta-table")

Databricks Community

Delta tables: Cannot set default column mapping mode to "name" in Python for delta tables

Connect with Databricks Users in Your Area

Databricks Learning Festival (Virtual): 15 January - 31 January 2025

Milestone: DatabricksTV Reaches 100 Videos!

Announcing the new Meta Llama 3.3 model on Databricks

Databricks Community Champion - December 2024 - Sujesh Menon

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences