Handling Unknown Fields in DLT Pipeline

mikeagicman — Mon, 08 Apr 2024 11:43:57 GMT

Hi
I'm working on a DLT pipeline where I read JSON files stored in S3.
I'm using the auto loader to identify the file schema and adding schema hints for some fields to specify their type.
When running it against a single data file that contains additional fields beyond the schema hint,
I encounter the following error: 'terminated with exception: [UNKNOWN_FIELD_EXCEPTION.NEW_FIELDS_IN_RECORD_WITH_FILE_PATH] Encountered unknown fields during parsing.'
After that, I get a list of the additional fields that were identified and do not appear in the schema hint, along with a recommendation: 'which can be fixed by an automatic retry: false.'
What does 'automatic retry: false' mean? I've tried various start and restart methods, but it still doesn't work.

Even though I've set the `inferColumnTypes` option to true and additionally set `schemaEvolutionMode` to `addNewColumns`, even though it's the default.
I've tried the same thing in another pipeline with a slightly less complex file, and it worked great, identifying all the fields that weren't in the schema hint.
But here, with a bit more complexity, it's causing me trouble.

I'd appreciate any help you can provide - thank you very much!

Re: Handling Unknown Fields in DLT Pipeline

jb1z — Fri, 24 Jan 2025 01:31:21 GMT

Hi community and @mikeagicman i saw this error when trying to load a json file. I discovered the problem was that the schemaLocation i was using was pointing to a different table schema, so it was trying to match columns that did not exist. When i set this to a new schema folder it worked.

.option('cloudFiles.schemaLocation', '/Workspace/..')

topic Handling Unknown Fields in DLT Pipeline in Data Engineering

Handling Unknown Fields in DLT Pipeline

Re: Handling Unknown Fields in DLT Pipeline