03-30-2025 03:17 PM
Hi All,
We are planning to store some mixed json files in blob store and read into Databricks. I am questioning whether we should have a container for each structure or if the various tools in Databricks can successfully read the different types. I have my doubts being there is no way to separate them as it's a flat file structure regardless of what we write the files to look like in the storage to us humans.
I can filter the files in a python script, but that prevents them from things like autoloader or am I missing something in how to use autoloader in this scenario.
How have others approached this?
04-01-2025 08:25 AM
If they're all JSON but have different structure you can use the variant type
https://docs.databricks.com/aws/en/sql/language-manual/data-types/variant-type
There's a few examples in this blog too: https://www.databricks.com/blog/introducing-open-variant-data-type-delta-lake-and-apache-spark
04-06-2025 10:02 PM
This doesn't hit the mark as I am referring to each json file representing a different table of data. I think multiple structures in a blob container confuse a lot of tools and that means you have to do file by file loading and that is going to be the least efficient approach.
05-02-2025 06:36 AM
Variant should work in this scenario too. There's also been some performance improvements with variant so much more of the metadata has stats for efficient processing.
You also don't have to go file by file. You can use things like autoloader that will checkpoint all the reads, or if you want you can use "*" in a location to denote everything in that path.
05-15-2025 05:39 PM
I'll look at this once we go to production with the source files. I have split logs by file type to simplify this, but I'll go back and look again for the test space with mixed files
05-15-2025 08:44 PM
Organize files by schema into subfolders (e.g., /schema_type_a/, /schema_type_b/) in the same container.Avoid putting all JSON types in one folder
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now