Re: How to handle complex json schema - Databricks Community - 57635

Register to join the community

Get Started Discussions

Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.

I have a mounted external directory that is an s3 bucket with multiple subdirectories containing call log files in json format. The files are irregular and complex, when i try to use spark.read.json or spark.sql (SELECT *) i get the UNABLE_TO_INFER_SCHEMA error. the files are too complex to try and build a schema manually, plus there are thousands of files. what is the best approach for creating a dataframe with this data?

1 REPLY 1

Hi @chrisf_sts, One possible approach is to use the spark.read.option("multiline", "true") method to read multi-line.... This option allows Spark to handle JSON objects that span multiple lines. You can also use the inferSchema option to let Spark infer the schema of the JSON data automatically.

Another possible approach is to use the explode function to flatten the nested arrays and structures.... This function creates a new row for each element in the given array or map column. You can then select the columns you need from the exploded DataFrame.

I hope this helps you with your problem. If you have any other questions, feel free to ask me. 😊

never-displayed

You must be signed in to add attachments

never-displayed

Announcements

Submit your feedback and win a $25 gift card!

Databricks Unity Catalog Workshop

Join us at the Databricks Generative AI World Cup (Virtual Hackathon)

Upskill on Databricks in just an hour

Supernovas, Black Holes and Streaming Data