Load parent columns and not unnest using pyspark? ...

KristiLogos · ‎09-20-2024

I'm not sure I'm working this correctly but I'm having some issues with the column names when I try to load to a table in our databricks catalog. I have multiple .json.gz files in our blob container that I want to load to a table:

df = spark.read.option("multiline", "true").json(f"{LOC}/*.json.gz")

df.printSchema()

The schema looks something like this, for example user_properties has nested values App Brnd and Archit

|-- user_id: string (nullable = true)

|-- user_properties: struct (nullable = true)

| |-- App Brnd: string (nullable = true)

| |-- Archit: string (nullable = true)

when I try to load the df to our table for the first time:

df.write.mode("overwrite").saveAsTable("test.events")

I see this error:
Found invalid character(s) among ' ,;{}()\n\t=' in the column names of your schema. Please use other characters and try again.

Load parent columns and not unnest using pyspark? Found invalid character(s) ' ,;{}()\n' in schema