โ06-18-2022 04:02 AM
Hello,
I am trying to write Delta files for some CSV data. When I do
csv_dataframe.write.format("delta").save("/path/to/table.delta")
I get: AnalysisException:
Found invalid character(s) among " ,;{}()\n\t=" in the column names of your
schema.
Having looked up some docs, I expected the following to set the column mapping mode to "name" for all tables which would not cause this error:
spark.conf.set("spark.databricks.delta.defaults.columnMapping.mode", "name")
Running this before invoking `write(...)` does not work and I get the same error.
I have managed to do it in SQL using TBLPROPERTIES with the CREATE TABLE statement like so:
CREATE TABLE table_bronze_csv
USING CSV
OPTIONS (path '/path/to/data.csv', 'header' 'true', 'mode' 'FAILFAST');
CREATE TABLE table_bronze
USING DELTA
TBLPROPERTIES ("delta.columnMapping.mode" = "name")
AS SELECT * FROM table_bronze;
but am looking for the Python way of doing it.
Thanks
โ06-18-2022 12:36 PM
According to my understanding, it's property of delta tableโ not of a delta file.Thats why, it's not worked while you save it as delta file.
โ06-18-2022 10:43 AM
Just pass properties in option:
csv_dataframe.write.format("delta").option(โ"delta.columnMapping.mode","name").save(path)
โ06-18-2022 11:15 AM
Thanks @Hemant Kumarโ, I tried exactly what you said, the below is a copy/paste (with sanitized names):
table_bronze.write.format("delta").option("delta.columnMapping.mode", "name").save("/path/to/table.delta")
I got the same error. Is there a bug?
โ06-18-2022 11:37 AM
Sorry, I forgot to mention saveastable , try this:
table_bronze.write.format("delta").option("delta.columnMapping.mode", "name").option("path","/path/to/table_bronze").saveAsTable("table_bronze")
โ
โ06-18-2022 11:54 AM
Thanks that worked. I can now query it with SQL. Can you explain why is it that I have to do saveAsTable with path set in an option? I thought calling `save()` was the way to do this kind of operation.
โ06-18-2022 12:36 PM
According to my understanding, it's property of delta tableโ not of a delta file.Thats why, it's not worked while you save it as delta file.
โ04-27-2023 11:06 PM
I was able to save it as a delta file. You need to specify the minReaderVersion and minWriterVersion as well. E.g.
spark_df.write.format("delta").mode('overwrite').option("delta.columnMapping.mode", "name").option('delta.minReaderVersion', '2').option('delta.minWriterVersion', '5').save('/path/to/table')
Reference: Table protocol versioning โ Delta Lake Documentation
โ10-01-2024 04:38 PM
I still get the error when I try any method. The column names with spaces are throwing error
[DELTA_INVALID_CHARACTERS_IN_COLUMN_NAMES] Found invalid character(s) among ' ,;{}()\n\t=' in the column names of your schema.
df1.write.format("delta") \
.mode("overwrite") \
.option("mergeSchema", 'true') \
.option("delta.columnMapping.mode", "name") \
.option('delta.minReaderVersion', '2') \
.option('delta.minWriterVersion', '5') \
.option("path","/tmp/spark-delta-table") \
.saveAsTable("`spark-delta-table`")
Or
df1.write.format("delta") \
.mode("overwrite") \
.option("mergeSchema", 'true') \
.option("delta.columnMapping.mode", "name") \
.option('delta.minReaderVersion', '2') \
.option('delta.minWriterVersion', '5') \
.save("/tmp/spark-delta-table")
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group