Load parent columns and not unnest using pyspark? Found invalid character(s) ' ,;{}()\n' in schema

KristiLogos
Contributor

I'm not sure I'm working this correctly but I'm having some issues with the column names when I try to load to a table in our databricks catalog. I have multiple .json.gz files in our blob container that I want to load to a table:

df = spark.read.option("multiline", "true").json(f"{LOC}/*.json.gz")
df.printSchema()
 
The schema looks something like this, for example user_properties has nested values App Brnd and Archit
 

|-- user_id: string (nullable = true)
|-- user_properties: struct (nullable = true)
| |-- App Brnd: string (nullable = true)
| |-- Archit: string (nullable = true)
 
when I try to load the df to our table for the first time:

df.write.mode("overwrite").saveAsTable("test.events")
 
I see this error:
Found invalid character(s) among ' ,;{}()\n\t=' in the column names of your schema. Please use other characters and try again.