Auto Loader fails when reading json element containing space
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-11-2023 02:42 AM
I'm using Auto Loader as part of a Delta Live Tables pipeline to ingest json files, and today it failed with this error message:
om.databricks.sql.transaction.tahoe.DeltaAnalysisException: Found invalid character(s) among ' ,;{}()\n\t=' in the column names of your schema.
org.apache.spark.sql.AnalysisException: Column name "NotificationSettings.element.Microsoft Teams" contains invalid character(s). Please use alias to rename it.The failing json file contains an element named "Microsoft Teams" which causes the pipeline to fail. How can I handle such elements? The error message mentions the use of an alias, but I can't find any info on how to implement this.
"NotificationSettings": [
{
"NotificationType": "MissedActivityReminder",
"Microsoft Teams": true
},
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-13-2023 09:52 AM
Please check if this helps:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-14-2023 12:36 AM
I could not get the DeltaTable solution to work in combination with Auto Loader/DLT/Unity Catalog, since it expects a table location and I want the framework to handle that.
I also tried withColumnRenamed, but I can't get it to work either. I still get the error message shown in my original question.
withColumnRenamed("NotificationSettings.Microsoft Teams", "MicrosoftTeams")
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-14-2023 04:02 AM
- You can read the input file using Pandas or Koalas (https://koalas.readthedocs.io/en/latest/index.html)
- then rename the columns
- then convert the Pandas/Koalas dataframe to Spark dataframe. You can write it back with the correct column name, so the next time you use it, the error will not happen.