Hello,
I'm using the auto loader to stream a table of data and have added schema hints to specify field values.
I've observed that when my initial data file is missing fields specified in the schema hint,
the auto loader correctly identifies this and adds them to the schema.
However, if these missing fields are nested within a struct, it throws an error stating "Couldn't find column example in:",
despite setting the attribute cloudFiles.inferColumnTypes = True.
For example, with the schema hints:
SCHEMA_HINTS = [
'aaa TIMESTAMP',
'bbb.ccc INT']
If the first data file contains:
{
"aaa": "2020-09-22T00:00:00Z",
"bbb": {
"ccc": 1234
},
"ddd": "blabla"
}
Then ddd is added to the schema seamlessly.
However, if the first data file is missing fields within the struct, like so:
{
"aaa": "2020-09-22T00:00:00Z",
"ddd": "blabla"
}
Then an error occurs:
Couldn't find column bbb in:
root
|-- aaa: timestamp (nullable = true)
|-- ddd: string (nullable = true)
Why doesn't the auto loader add these fields to the schema in this case?
Is there a solution to ensure it does?
Thank you!