Hi Everyone,
Trying to read JSON files with autoloader is failing to infer the schema correctly, every nested or struct column is being inferred as a string.
spark.readStream.format("cloudFiles")
.option("cloudFiles.format", "json")
.option("cloudFiles.schemaLocation", CHECKPOINT_PATH)
.option("multiLine", True)
.load(f"/Volumes/{CATALOG}/{SCHEMA}/files/")
When I read the same files normally with spark is actually to infer the schema correctly.
spark.read.format("json").option("multiline", True).load(f"/Volumes/{CATALOG}/{SCHEMA}/files/")
I've deleted the checkpoint also to see if that was causing the problem but still the same.
Here are the schemas to compare
Autoloader:
root
|-- changelog: string (nullable = true)
|-- issue: string (nullable = true)
|-- issue_event_type_name: string (nullable = true)
|-- timestamp: string (nullable = true)
|-- user: string (nullable = true)
|-- webhookEvent: string (nullable = true)
|-- project: string (nullable = true)
|-- year: string (nullable = true)
|-- month: string (nullable = true)
|-- day: string (nullable = true)
|-- issue_id: string (nullable = true)
|-- _rescued_data: string (nullable = true)
Spark Normal:
root
|-- changelog: struct (nullable = true)
| |-- id: string (nullable = true)
| |-- items: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- field: string (nullable = true)
| | | |-- fieldId: string (nullable = true)
| | | |-- fieldtype: string (nullable = true)
| | | |-- from: string (nullable = true)
| | | |-- fromString: string (nullable = true)
| | | |-- tmpFromAccountId: string (nullable = true)
| | | |-- tmpToAccountId: string (nullable = true)
| | | |-- to: string (nullable = true)
| | | |-- toString: string (nullable = true)
|-- issue: struct (nullable = true)
| |-- fields: struct (nullable = true)
| | |-- assignee: string (nullable = true)
| | |-- attachment: array (nullable = true)
| | | |-- element: string (containsNull = true)
| | |-- components: array (nullable = true)
| | | |-- element: string (containsNull = true)
| | |-- created: string (nullable = true)
| | |-- creator: struct (nullable = true)
| | | |-- accountId: string (nullable = true)
| | | |-- accountType: string (nullable = true)
| | | |-- active: boolean (nullable = true)
| | | |-- avatarUrls: struct (nullable = true)
| | | | |-- 16x16: string (nullable = true)
...