Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-20-2024 09:24 AM
I'm not sure I'm working this correctly but I'm having some issues with the column names when I try to load to a table in our databricks catalog. I have multiple .json.gz files in our blob container that I want to load to a table:
df = spark.read.option("multiline", "true").json(f"{LOC}/*.json.gz")
df.printSchema()
The schema looks something like this, for example user_properties has nested values App Brnd and Archit
|-- user_id: string (nullable = true)
|-- user_properties: struct (nullable = true)
| |-- App Brnd: string (nullable = true)
| |-- Archit: string (nullable = true)
when I try to load the df to our table for the first time:
df.write.mode("overwrite").saveAsTable("test.events")
I see this error:
Found invalid character(s) among ' ,;{}()\n\t=' in the column names of your schema. Please use other characters and try again.
Found invalid character(s) among ' ,;{}()\n\t=' in the column names of your schema. Please use other characters and try again.
Labels:
- Labels:
-
Workflows