cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

issue in loading the json files in same container with different schemas

kickbuttowski
New Contributor II

Could you tell whether this scenario will work or not 

Scenario : i have a container which is having two different json files with diff schemas which will be coming in a streaming manner , i am using an auto loader here to load the files incrementally , can one autoloader the json files in the container which is having two diff schemas ? i've already tried with one file and one schema and its working , but i'm struck here while doing it for two types of json files. for loading two json files , i have stored the schema in the adls gen2 container and called it in my notebook ,but it didnt help me

1 ACCEPTED SOLUTION

Accepted Solutions

MichTalebzadeh
Valued Contributor

Short answer is no. A single Spark AutoLoader typically cannot handle JSON files in a container with two different schemas by default.. AutoLoader relies on schema inference to determine the data structure. It analyses a sample of data from files assumed to have a consistent schema. If files have different schemas ,  AutoLoader's inferred schema would be inaccurate. This leads to errors when processing files with structures that dont match the inferred schema.

Mich Talebzadeh | Technologist | Data | Generative AI | Financial Fraud
London
United Kingdom

view my Linkedin profile



https://en.everybodywiki.com/Mich_Talebzadeh



Disclaimer: The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand expert opinions (Werner Von Braun)".

View solution in original post

1 REPLY 1

MichTalebzadeh
Valued Contributor

Short answer is no. A single Spark AutoLoader typically cannot handle JSON files in a container with two different schemas by default.. AutoLoader relies on schema inference to determine the data structure. It analyses a sample of data from files assumed to have a consistent schema. If files have different schemas ,  AutoLoader's inferred schema would be inaccurate. This leads to errors when processing files with structures that dont match the inferred schema.

Mich Talebzadeh | Technologist | Data | Generative AI | Financial Fraud
London
United Kingdom

view my Linkedin profile



https://en.everybodywiki.com/Mich_Talebzadeh



Disclaimer: The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand expert opinions (Werner Von Braun)".

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group