cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

issue in loading the json files in same container with different schemas

kickbuttowski
New Contributor II

Could you tell whether this scenario will work or not 

Scenario : i have a container which is having two different json files with diff schemas which will be coming in a streaming manner , i am using an auto loader here to load the files incrementally , can one autoloader the json files in the container which is having two diff schemas ? i've already tried with one file and one schema and its working , but i'm struck here while doing it for two types of json files. for loading two json files , i have stored the schema in the adls gen2 container and called it in my notebook ,but it didnt help me

1 ACCEPTED SOLUTION

Accepted Solutions

MichTalebzadeh
Contributor

Short answer is no. A single Spark AutoLoader typically cannot handle JSON files in a container with two different schemas by default.. AutoLoader relies on schema inference to determine the data structure. It analyses a sample of data from files assumed to have a consistent schema. If files have different schemas ,  AutoLoader's inferred schema would be inaccurate. This leads to errors when processing files with structures that dont match the inferred schema.

Mich Talebzadeh | Technologist | Data | Generative AI | Financial Fraud
London
United Kingdom

view my Linkedin profile



https://en.everybodywiki.com/Mich_Talebzadeh



Disclaimer: The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand expert opinions (Werner Von Braun)".

View solution in original post

1 REPLY 1

MichTalebzadeh
Contributor

Short answer is no. A single Spark AutoLoader typically cannot handle JSON files in a container with two different schemas by default.. AutoLoader relies on schema inference to determine the data structure. It analyses a sample of data from files assumed to have a consistent schema. If files have different schemas ,  AutoLoader's inferred schema would be inaccurate. This leads to errors when processing files with structures that dont match the inferred schema.

Mich Talebzadeh | Technologist | Data | Generative AI | Financial Fraud
London
United Kingdom

view my Linkedin profile



https://en.everybodywiki.com/Mich_Talebzadeh



Disclaimer: The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand expert opinions (Werner Von Braun)".