cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to read JSON files embedded in a list of lists?

AmineHY
Contributor

Hello

I am trying to read this JSON file but didn't succeed

imageimage 

You can see the head of the file, JSON inside a list of lists. Any idea how to read this file?

image

1 ACCEPTED SOLUTION

Accepted Solutions

AmineHY
Contributor

Here is my solution, I am sure it can be optimized

import json
 data=[]
with open(path_to_json_file, 'r') as f:
   data.extend(json.load(f))
   df = spark.createDataFrame(data[0], schema=schema)

✌️

View solution in original post

7 REPLIES 7

Debayan
Esteemed Contributor III
Esteemed Contributor III

Hi @Amine HADJ-YOUCEF​ , The data sources are limited, please refer: https://spark.apache.org/docs/latest/sql-data-sources-json.html#data-source-option

Also, https://docs.databricks.com/external-data/json.html

Please let us know if this helps.

Kaniz_Fatma
Community Manager
Community Manager

Hi @Amine HADJ-YOUCEF, Please see these threads which have a solution to similar queries and LMK if that helps:-

Thank you for sharing,

these links do not address the exact problem I am facing

Kaniz_Fatma
Community Manager
Community Manager

Hi @Amine HADJ-YOUCEF​, We haven’t heard from you since the last response from @Debayan Mukherjee​ and me​, and I was checking back to see if our suggestions helped you.

Or else, If you have any solution, please share it with the community, as it can be helpful to others.

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

AmineHY
Contributor

For now I red the file as a Text and it gives :

imageit is the concatenation of multiple JSON files, that why the native JSON parser can't load the data.

The surprising thing is that it detects the right schema!

AmineHY
Contributor

Here is my solution, I am sure it can be optimized

import json
 data=[]
with open(path_to_json_file, 'r') as f:
   data.extend(json.load(f))
   df = spark.createDataFrame(data[0], schema=schema)

✌️

Hi @Amine HADJ-YOUCEF​, Thank you for sharing the solution with the community!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group