cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Handling Unknown Fields in DLT Pipeline

mikeagicman
New Contributor

Hi
I'm working on a DLT pipeline where I read JSON files stored in S3.
I'm using the auto loader to identify the file schema and adding schema hints for some fields to specify their type.
When running it against a single data file that contains additional fields beyond the schema hint,
I encounter the following error: 'terminated with exception: [UNKNOWN_FIELD_EXCEPTION.NEW_FIELDS_IN_RECORD_WITH_FILE_PATH] Encountered unknown fields during parsing.'
After that, I get a list of the additional fields that were identified and do not appear in the schema hint, along with a recommendation: 'which can be fixed by an automatic retry: false.'
What does 'automatic retry: false' mean? I've tried various start and restart methods, but it still doesn't work.

Even though I've set the `inferColumnTypes` option to true and additionally set `schemaEvolutionMode` to `addNewColumns`, even though it's the default.
I've tried the same thing in another pipeline with a slightly less complex file, and it worked great, identifying all the fields that weren't in the schema hint.
But here, with a bit more complexity, it's causing me trouble.

I'd appreciate any help you can provide - thank you very much!

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group