Reading different file structures for json files in blob stores
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago
Hi All,
We are planning to store some mixed json files in blob store and read into Databricks. I am questioning whether we should have a container for each structure or if the various tools in Databricks can successfully read the different types. I have my doubts being there is no way to separate them as it's a flat file structure regardless of what we write the files to look like in the storage to us humans.
I can filter the files in a python script, but that prevents them from things like autoloader or am I missing something in how to use autoloader in this scenario.
How have others approached this?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a week ago
If they're all JSON but have different structure you can use the variant type
https://docs.databricks.com/aws/en/sql/language-manual/data-types/variant-type
There's a few examples in this blog too: https://www.databricks.com/blog/introducing-open-variant-data-type-delta-lake-and-apache-spark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sunday
This doesn't hit the mark as I am referring to each json file representing a different table of data. I think multiple structures in a blob container confuse a lot of tools and that means you have to do file by file loading and that is going to be the least efficient approach.

