Handling Unknown Fields in DLT Pipeline
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-08-2024 04:43 AM
Hi
I'm working on a DLT pipeline where I read JSON files stored in S3.
I'm using the auto loader to identify the file schema and adding schema hints for some fields to specify their type.
When running it against a single data file that contains additional fields beyond the schema hint,
I encounter the following error: 'terminated with exception: [UNKNOWN_FIELD_EXCEPTION.NEW_FIELDS_IN_RECORD_WITH_FILE_PATH] Encountered unknown fields during parsing.'
After that, I get a list of the additional fields that were identified and do not appear in the schema hint, along with a recommendation: 'which can be fixed by an automatic retry: false.'
What does 'automatic retry: false' mean? I've tried various start and restart methods, but it still doesn't work.
Even though I've set the `inferColumnTypes` option to true and additionally set `schemaEvolutionMode` to `addNewColumns`, even though it's the default.
I've tried the same thing in another pipeline with a slightly less complex file, and it worked great, identifying all the fields that weren't in the schema hint.
But here, with a bit more complexity, it's causing me trouble.
I'd appreciate any help you can provide - thank you very much!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-23-2025 05:21 PM - edited 01-23-2025 05:31 PM
Hi community and @mikeagicman i saw this error when trying to load a json file. I discovered the problem was that the schemaLocation i was using was pointing to a different table schema, so it was trying to match columns that did not exist. When i set this to a new schema folder it worked.
.option('cloudFiles.schemaLocation', '/Workspace/..')

