Databricks Community

mikeagicman · ‎04-08-2024

Hi
I'm working on a DLT pipeline where I read JSON files stored in S3.
I'm using the auto loader to identify the file schema and adding schema hints for some fields to specify their type.
When running it against a single data file that contains additional fields beyond the schema hint,
I encounter the following error: 'terminated with exception: [UNKNOWN_FIELD_EXCEPTION.NEW_FIELDS_IN_RECORD_WITH_FILE_PATH] Encountered unknown fields during parsing.'
After that, I get a list of the additional fields that were identified and do not appear in the schema hint, along with a recommendation: 'which can be fixed by an automatic retry: false.'
What does 'automatic retry: false' mean? I've tried various start and restart methods, but it still doesn't work.

Even though I've set the `inferColumnTypes` option to true and additionally set `schemaEvolutionMode` to `addNewColumns`, even though it's the default.
I've tried the same thing in another pipeline with a slightly less complex file, and it worked great, identifying all the fields that weren't in the schema hint.
But here, with a bit more complexity, it's causing me trouble.

I'd appreciate any help you can provide - thank you very much!

jb1z · ‎01-23-2025

Hi community and @mikeagicman i saw this error when trying to load a json file. I discovered the problem was that the schemaLocation i was using was pointing to a different table schema, so it was trying to match columns that did not exist. When i set this to a new schema folder it worked.

.option('cloudFiles.schemaLocation', '/Workspace/..')

Databricks Community

Handling Unknown Fields in DLT Pipeline

Photos

Join Us as a Local Community Builder!

Announcing the APJ Databricks Smart Business Insights Challenge: Empowering Data-Driven Decision Mak

🚀 Monthly Databricks Get Started Days – Accelerate Your Learning Journey! 🚀

Business Intelligence in the Era of AI

Virtual Learning Festival: 9 April - 30 April

Data + AI Summit 2025 — registration now open!