Importing JSON files when format is subject to evolution

etum — Sun, 12 May 2024 21:31:06 GMT

Hi there,

I'm reaching out for some assistance with importing JSON files into Databricks. Still a beginner even if I've gained experience working with various data import batches (CSV/JSON) for application monitoring: I'm currently facing a challenge with a specific JSON data set.

The structure of this JSON file has evolved over the past year, with new fields being added. As a result, our current Python code using inferSchema to automatically detect the format is encountering errors during import.

I've explored several approaches to handle schema evolution, including concepts like schema merging, but haven't yet found a solution that works consistently. I believe I'm close to a solution, but I'm running into some roadblocks.

Any insights or suggestions you might have on handling evolving JSON schemas during import would be greatly appreciated.

Thanks in advance for your help!

topic Importing JSON files when format is subject to evolution in Data Engineering

Importing JSON files when format is subject to evolution