Databricks

AmineHY · ‎11-16-2022

Hello

I am trying to read this JSON file but didn't succeed

You can see the head of the file, JSON inside a list of lists. Any idea how to read this file?

AmineHY · ‎11-21-2022

Here is my solution, I am sure it can be optimized

import json
 data=[]
with open(path_to_json_file, 'r') as f:
   data.extend(json.load(f))
   df = spark.createDataFrame(data[0], schema=schema)

✌️

View solution in original post

Debayan · ‎11-17-2022

Hi @Amine HADJ-YOUCEF , The data sources are limited, please refer: https://spark.apache.org/docs/latest/sql-data-sources-json.html#data-source-option

Also, https://docs.databricks.com/external-data/json.html

Please let us know if this helps.

Kaniz · ‎11-18-2022

Hi @Amine HADJ-YOUCEF, Please see these threads which have a solution to similar queries and LMK if that helps:-

https://stackoverflow.com/questions/66108297/spark-how-to-parse-json-string-of-nested-lists-to-spark-data-frame
https://stackoverflow.com/questions/63502556/pyspark-read-nested-json-from-a-string-type-column-and-...

AmineHY · ‎11-21-2022

Thank you for sharing,

these links do not address the exact problem I am facing

Kaniz · ‎11-20-2022

Hi @Amine HADJ-YOUCEF, We haven’t heard from you since the last response from @Debayan Mukherjee and me, and I was checking back to see if our suggestions helped you.

Or else, If you have any solution, please share it with the community, as it can be helpful to others.

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

AmineHY · ‎11-21-2022

For now I red the file as a Text and it gives :

it is the concatenation of multiple JSON files, that why the native JSON parser can't load the data.

The surprising thing is that it detects the right schema!

AmineHY · ‎11-21-2022

Here is my solution, I am sure it can be optimized

import json
 data=[]
with open(path_to_json_file, 'r') as f:
   data.extend(json.load(f))
   df = spark.createDataFrame(data[0], schema=schema)

✌️

Kaniz · ‎11-21-2022

Hi @Amine HADJ-YOUCEF, Thank you for sharing the solution with the community!

Databricks

How to read JSON files embedded in a list of lists?

How to successfully build GenAI applications

Registration now open! Databricks Data + AI Summit 2024

Meet DBRX, the New Standard for High-Quality LLMs

Data Warehousing in the Era of AI