cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Reading different file structures for json files in blob stores

turagittech
New Contributor III

Hi All,

We are planning to store some mixed json files in blob store and read into Databricks. I am questioning whether we should have a container for each structure or if the various tools in Databricks can successfully read the different types. I have my doubts being there is no way to separate them as it's a flat file structure regardless of what we write the files to look like in the storage to us humans.

I can filter the files in a python script, but that prevents them from things like autoloader or am I missing something in how to use autoloader in this scenario.

How have others approached this?

2 REPLIES 2

holly
Databricks Employee
Databricks Employee

If they're all JSON but have different structure you can use the variant type

https://docs.databricks.com/aws/en/sql/language-manual/data-types/variant-type

There's a few examples in this blog too: https://www.databricks.com/blog/introducing-open-variant-data-type-delta-lake-and-apache-spark

turagittech
New Contributor III

This doesn't hit the mark as I am referring to each json file representing a different table of data. I think multiple structures in a blob container confuse a lot of tools and that means you have to do file by file loading and that is going to be the least efficient approach.