Databricks Community

kickbuttowski · ‎03-16-2024

Could you tell whether this scenario will work or not

Scenario : i have a container which is having two different json files with diff schemas which will be coming in a streaming manner , i am using an auto loader here to load the files incrementally , can one autoloader the json files in the container which is having two diff schemas ? i've already tried with one file and one schema and its working , but i'm struck here while doing it for two types of json files. for loading two json files , i have stored the schema in the adls gen2 container and called it in my notebook ,but it didnt help me

MichTalebzadeh · ‎03-16-2024

Short answer is no. A single Spark AutoLoader typically cannot handle JSON files in a container with two different schemas by default.. AutoLoader relies on schema inference to determine the data structure. It analyses a sample of data from files assumed to have a consistent schema. If files have different schemas , AutoLoader's inferred schema would be inaccurate. This leads to errors when processing files with structures that dont match the inferred schema.

Mich Talebzadeh | Technologist | Data | Generative AI | Financial Fraud
London
United Kingdom

view my Linkedin profile

https://en.everybodywiki.com/Mich_Talebzadeh

Disclaimer: The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand expert opinions (Werner Von Braun)".

View solution in original post

MichTalebzadeh · ‎03-16-2024

Short answer is no. A single Spark AutoLoader typically cannot handle JSON files in a container with two different schemas by default.. AutoLoader relies on schema inference to determine the data structure. It analyses a sample of data from files assumed to have a consistent schema. If files have different schemas , AutoLoader's inferred schema would be inaccurate. This leads to errors when processing files with structures that dont match the inferred schema.

Mich Talebzadeh | Technologist | Data | Generative AI | Financial Fraud
London
United Kingdom

view my Linkedin profile

https://en.everybodywiki.com/Mich_Talebzadeh

Disclaimer: The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand expert opinions (Werner Von Braun)".

Databricks Community

issue in loading the json files in same container with different schemas

Get Certified at Data & AI Summit and Earn this Exclusive Databricks Jacket

Supercharge Your Code Generation

Registration now open! Databricks Data + AI Summit 2024

Announcing General Availability of Liquid Clustering

Introducing the Databricks AI Fund