cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

issue in loading the json files in same container with different schemas

kickbuttowski
New Contributor II

Could you tell whether this scenario will work or not 

Scenario : i have a container which is having two different json files with diff schemas which will be coming in a streaming manner , i am using an auto loader here to load the files incrementally , can one autoloader the json files in the container which is having two diff schemas ? i've already tried with one file and one schema and its working , but i'm struck here while doing it for two types of json files. for loading two json files , i have stored the schema in the adls gen2 container and called it in my notebook ,but it didnt help me

1 ACCEPTED SOLUTION

Accepted Solutions

MichTalebzadeh
Contributor

Short answer is no. A single Spark AutoLoader typically cannot handle JSON files in a container with two different schemas by default.. AutoLoader relies on schema inference to determine the data structure. It analyses a sample of data from files assumed to have a consistent schema. If files have different schemas ,  AutoLoader's inferred schema would be inaccurate. This leads to errors when processing files with structures that dont match the inferred schema.

Mich Talebzadeh | Technologist | Data | Generative AI | Financial Fraud
London
United Kingdom

view my Linkedin profile



https://en.everybodywiki.com/Mich_Talebzadeh



Disclaimer: The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand expert opinions (Werner Von Braun)".

View solution in original post

1 REPLY 1

MichTalebzadeh
Contributor

Short answer is no. A single Spark AutoLoader typically cannot handle JSON files in a container with two different schemas by default.. AutoLoader relies on schema inference to determine the data structure. It analyses a sample of data from files assumed to have a consistent schema. If files have different schemas ,  AutoLoader's inferred schema would be inaccurate. This leads to errors when processing files with structures that dont match the inferred schema.

Mich Talebzadeh | Technologist | Data | Generative AI | Financial Fraud
London
United Kingdom

view my Linkedin profile



https://en.everybodywiki.com/Mich_Talebzadeh



Disclaimer: The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand expert opinions (Werner Von Braun)".
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.