cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to handle complex json schema

chrisf_sts
New Contributor II

I have a mounted external directory that is an s3 bucket with multiple subdirectories containing call log files in json format.  The files are irregular and complex, when i try to use spark.read.json or spark.sql (SELECT *) i get the UNABLE_TO_INFER_SCHEMA error.  the files are too complex to try and build a schema manually, plus there are thousands of files.  what is the best approach for creating a dataframe with this data?

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @chrisf_sts, One possible approach is to use the spark.read.option("multiline", "true") method to read multi-line.... This option allows Spark to handle JSON objects that span multiple lines. You can also use the inferSchema option to let Spark infer the schema of the JSON data automatically.

 

Another possible approach is to use the explode function to flatten the nested arrays and structures.... This function creates a new row for each element in the given array or map column. You can then select the columns you need from the exploded DataFrame.

 

I hope this helps you with your problem. If you have any other questions, feel free to ask me. ๐Ÿ˜Š

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group