Data Engineering

Forum Posts

Sorted by:

by Sameer_876675 • New Contributor III

12-07-2022 4:22:17 AM

7366 Views
3 replies
2 kudos

How to efficiently process a 100GiB JSON nested file and store it in Delta?

Hi, I'm a fairly new user and I am using Azure Databricks to process a ~1000GiB JSON nested file containing insurance policy data. I uploaded the JSON file to Azure Data Lake Gen2 storage and read the JSON file into a dataframe.df=spark.read.option("...

Data Engineering

7366 Views
3 replies
2 kudos

12-07-2022 4:22:17 AM

View Replies

Latest Reply

Annapurna_Hiriy
Databricks Employee

01-31-2023 8:20:49 AM

2 kudos

Hi Sameer, please refer to following documents on how to work with nested json:https://docs.databricks.com/optimizations/semi-structured.htmlhttps://learn.microsoft.com/en-us/azure/databricks/kb/_static/notebooks/scala/nested-json-to-dataframe.html

2 kudos

01-31-2023 8:20:49 AM

2 More Replies

by suman9872 • New Contributor II

02-23-2022 6:33:16 AM

2434 Views
0 replies
1 kudos

How to dynamically convert Spark DataFrame to Nested json using Spark Scala

I want to convert the DataFrame to nested json. Sourse Data:-DataFrame have data value like :- As image 2 Expected Output:-I have to convert DataFrame value to Nested Json like : -As image 1Appreciate your help !

Data Engineering

2434 Views
0 replies
1 kudos

02-23-2022 6:33:16 AM

by DarshilDesai • New Contributor II

06-16-2020 6:18:09 PM

15225 Views
1 replies
3 kudos

Resolved! How to Efficiently Read Nested JSON in PySpark?

I am having trouble efficiently reading & parsing in a large number of stream files in Pyspark! Context Here is the schema of the stream file that I am reading in JSON. Blank spaces are edits for confidentiality purposes. root |-- location_info: ar...

Data Engineering

15225 Views
1 replies
3 kudos

06-16-2020 6:18:09 PM

View Replies

Latest Reply

Chris_Shehu
Valued Contributor III

02-21-2022 11:00:10 AM

3 kudos

I'm interested in seeing what others have come up with. Currently I'm using Json. normalize() then taking any additional nested statements and using a loop to pull them out -> re-combine them.

3 kudos

02-21-2022 11:00:10 AM

Databricks Community

How to efficiently process a 100GiB JSON nested file and store it in Delta?

How to dynamically convert Spark DataFrame to Nested json using Spark Scala

Resolved! How to Efficiently Read Nested JSON in PySpark?