cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to read JSON files embedded in a list of lists?

AmineHY
Contributor

Hello

I am trying to read this JSON file but didn't succeed

imageimage 

You can see the head of the file, JSON inside a list of lists. Any idea how to read this file?

image

1 ACCEPTED SOLUTION

Accepted Solutions

AmineHY
Contributor

Here is my solution, I am sure it can be optimized

import json
 data=[]
with open(path_to_json_file, 'r') as f:
   data.extend(json.load(f))
   df = spark.createDataFrame(data[0], schema=schema)

✌️

View solution in original post

7 REPLIES 7

Debayan
Esteemed Contributor III
Esteemed Contributor III

Hi @Amine HADJ-YOUCEF​ , The data sources are limited, please refer: https://spark.apache.org/docs/latest/sql-data-sources-json.html#data-source-option

Also, https://docs.databricks.com/external-data/json.html

Please let us know if this helps.

Kaniz
Community Manager
Community Manager

Hi @Amine HADJ-YOUCEF, Please see these threads which have a solution to similar queries and LMK if that helps:-

Thank you for sharing,

these links do not address the exact problem I am facing

Kaniz
Community Manager
Community Manager

Hi @Amine HADJ-YOUCEF​, We haven’t heard from you since the last response from @Debayan Mukherjee​ and me​, and I was checking back to see if our suggestions helped you.

Or else, If you have any solution, please share it with the community, as it can be helpful to others.

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

AmineHY
Contributor

For now I red the file as a Text and it gives :

imageit is the concatenation of multiple JSON files, that why the native JSON parser can't load the data.

The surprising thing is that it detects the right schema!

AmineHY
Contributor

Here is my solution, I am sure it can be optimized

import json
 data=[]
with open(path_to_json_file, 'r') as f:
   data.extend(json.load(f))
   df = spark.createDataFrame(data[0], schema=schema)

✌️

Kaniz
Community Manager
Community Manager

Hi @Amine HADJ-YOUCEF​, Thank you for sharing the solution with the community!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.