06-18-2022 04:02 AM
Hello,
I am trying to write Delta files for some CSV data. When I do
csv_dataframe.write.format("delta").save("/path/to/table.delta")
I get: AnalysisException:
Found invalid character(s) among " ,;{}()\n\t=" in the column names of your
schema.
Having looked up some docs, I expected the following to set the column mapping mode to "name" for all tables which would not cause this error:
spark.conf.set("spark.databricks.delta.defaults.columnMapping.mode", "name")
Running this before invoking `write(...)` does not work and I get the same error.
I have managed to do it in SQL using TBLPROPERTIES with the CREATE TABLE statement like so:
CREATE TABLE table_bronze_csv
USING CSV
OPTIONS (path '/path/to/data.csv', 'header' 'true', 'mode' 'FAILFAST');
CREATE TABLE table_bronze
USING DELTA
TBLPROPERTIES ("delta.columnMapping.mode" = "name")
AS SELECT * FROM table_bronze;
but am looking for the Python way of doing it.
Thanks
06-18-2022 12:36 PM
According to my understanding, it's property of delta table not of a delta file.Thats why, it's not worked while you save it as delta file.
06-18-2022 10:43 AM
Just pass properties in option:
csv_dataframe.write.format("delta").option("delta.columnMapping.mode","name").save(path)
06-18-2022 11:15 AM
Thanks @Hemant Kumar, I tried exactly what you said, the below is a copy/paste (with sanitized names):
table_bronze.write.format("delta").option("delta.columnMapping.mode", "name").save("/path/to/table.delta")
I got the same error. Is there a bug?
06-18-2022 11:37 AM
Sorry, I forgot to mention saveastable , try this:
table_bronze.write.format("delta").option("delta.columnMapping.mode", "name").option("path","/path/to/table_bronze").saveAsTable("table_bronze")
06-18-2022 11:54 AM
Thanks that worked. I can now query it with SQL. Can you explain why is it that I have to do saveAsTable with path set in an option? I thought calling `save()` was the way to do this kind of operation.
06-18-2022 12:36 PM
According to my understanding, it's property of delta table not of a delta file.Thats why, it's not worked while you save it as delta file.
04-27-2023 11:06 PM
I was able to save it as a delta file. You need to specify the minReaderVersion and minWriterVersion as well. E.g.
spark_df.write.format("delta").mode('overwrite').option("delta.columnMapping.mode", "name").option('delta.minReaderVersion', '2').option('delta.minWriterVersion', '5').save('/path/to/table')
Reference: Table protocol versioning — Delta Lake Documentation
2 weeks ago
I still get the error when I try any method. The column names with spaces are throwing error
[DELTA_INVALID_CHARACTERS_IN_COLUMN_NAMES] Found invalid character(s) among ' ,;{}()\n\t=' in the column names of your schema.
df1.write.format("delta") \
.mode("overwrite") \
.option("mergeSchema", 'true') \
.option("delta.columnMapping.mode", "name") \
.option('delta.minReaderVersion', '2') \
.option('delta.minWriterVersion', '5') \
.option("path","/tmp/spark-delta-table") \
.saveAsTable("`spark-delta-table`")
Or
df1.write.format("delta") \
.mode("overwrite") \
.option("mergeSchema", 'true') \
.option("delta.columnMapping.mode", "name") \
.option('delta.minReaderVersion', '2') \
.option('delta.minWriterVersion', '5') \
.save("/tmp/spark-delta-table")
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group