06-18-2022 04:02 AM
Hello,
I am trying to write Delta files for some CSV data. When I do
csv_dataframe.write.format("delta").save("/path/to/table.delta")
I get: AnalysisException:
Found invalid character(s) among " ,;{}()\n\t=" in the column names of your
schema.
Having looked up some docs, I expected the following to set the column mapping mode to "name" for all tables which would not cause this error:
spark.conf.set("spark.databricks.delta.defaults.columnMapping.mode", "name")
Running this before invoking `write(...)` does not work and I get the same error.
I have managed to do it in SQL using TBLPROPERTIES with the CREATE TABLE statement like so:
CREATE TABLE table_bronze_csv
USING CSV
OPTIONS (path '/path/to/data.csv', 'header' 'true', 'mode' 'FAILFAST');
CREATE TABLE table_bronze
USING DELTA
TBLPROPERTIES ("delta.columnMapping.mode" = "name")
AS SELECT * FROM table_bronze;
but am looking for the Python way of doing it.
Thanks
06-18-2022 12:36 PM
According to my understanding, it's property of delta table not of a delta file.Thats why, it's not worked while you save it as delta file.
06-18-2022 10:43 AM
Just pass properties in option:
csv_dataframe.write.format("delta").option("delta.columnMapping.mode","name").save(path)
06-18-2022 11:15 AM
Thanks @Hemant Kumar, I tried exactly what you said, the below is a copy/paste (with sanitized names):
table_bronze.write.format("delta").option("delta.columnMapping.mode", "name").save("/path/to/table.delta")
I got the same error. Is there a bug?
06-18-2022 11:37 AM
Sorry, I forgot to mention saveastable , try this:
table_bronze.write.format("delta").option("delta.columnMapping.mode", "name").option("path","/path/to/table_bronze").saveAsTable("table_bronze")
06-18-2022 11:54 AM
Thanks that worked. I can now query it with SQL. Can you explain why is it that I have to do saveAsTable with path set in an option? I thought calling `save()` was the way to do this kind of operation.
06-18-2022 12:36 PM
According to my understanding, it's property of delta table not of a delta file.Thats why, it's not worked while you save it as delta file.
04-27-2023 11:06 PM
I was able to save it as a delta file. You need to specify the minReaderVersion and minWriterVersion as well. E.g.
spark_df.write.format("delta").mode('overwrite').option("delta.columnMapping.mode", "name").option('delta.minReaderVersion', '2').option('delta.minWriterVersion', '5').save('/path/to/table')
Reference: Table protocol versioning — Delta Lake Documentation
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.