Data Engineering

Forum Posts

Sorted by:

by AmineHY • Contributor

11-16-2022 5:24:01 AM

14278 Views
5 replies
6 kudos

Resolved! How to read JSON files embedded in a list of lists?

HelloI am trying to read this JSON file but didn't succeed You can see the head of the file, JSON inside a list of lists. Any idea how to read this file?

Data Engineering

14278 Views
5 replies
6 kudos

11-16-2022 5:24:01 AM

View Replies

Latest Reply

adriennn
Valued Contributor

09-12-2024 10:32:36 PM

6 kudos

The correct way to do this without using open, which will work only with local/mounted files is to read the files as binaryfile and then you will get the entire json string on each row, from there you can use from_json() and explode() to extract the ...

6 kudos

09-12-2024 10:32:36 PM

4 More Replies

by raduq • Contributor

04-04-2022 2:38:23 PM

44741 Views
10 replies
12 kudos

How to efficiently process a 50Gb JSON file and store it in Delta?

Hi, I'm a fairly new user and I am using Azure Databricks to process a ~50Gb JSON file containing real estate data. I uploaded the JSON file to Azure Data Lake Gen2 storage and read the JSON file into a dataframe.df = spark.read.option('multiline', '...

Data Engineering

44741 Views
10 replies
12 kudos

04-04-2022 2:38:23 PM

View Replies

Latest Reply

Renzer
New Contributor II

06-28-2023 2:29:56 PM

12 kudos

The spark connector is super slow. I found loading json into Azure cosmos dB then writing queries to get sections of data out was 25x times faster because cosmos dB indexes the json. You can stream read data from cosmosdb. You can find python code sn...

12 kudos

06-28-2023 2:29:56 PM

9 More Replies

by PK225 • New Contributor III

06-07-2023 10:34:46 AM

2010 Views
2 replies
1 kudos

Resolved! when reading Json file into DF , want to see data into rows wise, What be the solution

Data Engineering

2010 Views
2 replies
1 kudos

06-07-2023 10:34:46 AM

View Replies

Latest Reply

Vartika
Databricks Employee

06-09-2023 4:28:34 AM

1 kudos

Hi @Pavan Kumar,Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members too.Cheers!

1 kudos

06-09-2023 4:28:34 AM

1 More Replies

by konda1 • New Contributor

05-31-2023 3:27:13 AM

1381 Views
0 replies
0 kudos

Getting Executor lost due to stage failure error on writing data frame to a delta table or any file like parquet or csv or avro

We are working on multiline nested ( multilevel).The file is read and flattened using pyspark and the data frame is showing data using display() method. when saving the same dataframe it is giving executor lost failure error.for some files it is givi...

Data Engineering

1381 Views
0 replies
0 kudos

05-31-2023 3:27:13 AM

by kk007 • New Contributor III

04-07-2023 10:19:36 AM

4287 Views
4 replies
4 kudos

Photon engine throws error "JSON document exceeded maximum allowed size 400.0 MiB"

I am reading a 83MB json file using " spark.read.json(storage_path)", when I display the data is seems displaying fine, but when I try command line count, it complains about file size , being more than 400MB, which is not true.Photon JSON reader erro...

Data Engineering

4287 Views
4 replies
4 kudos

04-07-2023 10:19:36 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-09-2023 8:47:33 AM

4 kudos

@Kamal Kumar :The error message suggests that the JSON document size is exceeding the maximum allowed size of 400MB. This could be caused by one or more documents in your JSON file being larger than this limit. It is not a bug, but a limitation set ...

4 kudos

04-09-2023 8:47:33 AM

3 More Replies

by Aran_Oribu • New Contributor II

09-08-2022 3:43:52 AM

6109 Views
5 replies
2 kudos

Resolved! Create and update a csv/json file in ADLSG2 with Eventhub in Databricks streaming

Hello ,This is my first post here and I am a total beginner with DataBricks and spark.Working on an IoT Cloud project with azure , I'm looking to set up a continuous stream processing of data.A current architecture already exists thanks to Stream Ana...

Data Engineering

6109 Views
5 replies
2 kudos

09-08-2022 3:43:52 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

09-08-2022 3:48:23 AM

2 kudos

So the event hub creates files (json/csv) on adls.You can read those files into databricks with the spark.read.csv/json method. If you want to read many files in one go, you can use wildcards.f.e. spark.read.json("/mnt/datalake/bronze/directory/*/*...

2 kudos

09-08-2022 3:48:23 AM

4 More Replies

by BeginnerBob • New Contributor III

07-12-2022 10:27:54 AM

25068 Views
4 replies
2 kudos

Flatten a complex JSON file and load into a delta table

Hi,I am loading a JSON file into Databricks by simply doing the following:from pyspark.sql.functions import *from pyspark.sql.types import *bronze_path="wasbs://....../140477.json"df_incremental = spark.read.option("multiline","true").json(bronze_pat...

Data Engineering

25068 Views
4 replies
2 kudos

07-12-2022 10:27:54 AM

View Replies

Latest Reply

Vidula
Honored Contributor

09-03-2022 10:26:33 PM

2 kudos

Hi @Lloyd Vickery Does @Werner Stinckens response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

2 kudos

09-03-2022 10:26:33 PM

3 More Replies

by laus • New Contributor III

04-21-2022 6:55:25 AM

10990 Views
6 replies
3 kudos

Resolved! How to load a json file in pyspark with colon character in file name

Hi,I'm trying to load this json file which contains the colon character in its name: file_name.2022-03-05_11:30:00.json but I get the error in screenshot below saying that there is a relative path in an absolute url - Any idea how to read this file...

Data Engineering

10990 Views
6 replies
3 kudos

04-21-2022 6:55:25 AM

View Replies

Latest Reply

Noopur_Nigam
Databricks Employee

08-01-2022 9:50:31 PM

3 kudos

Hi @Laura Blancarte I hope that @Pearl Ubaru's answer would have helped you in resolving your issue.Please let us know if you need more help on this.

3 kudos

08-01-2022 9:50:31 PM

5 More Replies

by Orianh • Valued Contributor II

10-17-2021 4:55:24 AM

9218 Views
4 replies
2 kudos

Resolved! Read JSON with backslash.

Hello guys.I'm trying to read JSON file which contains backslash and failed to read it via pyspark.Tried a lot of options but didn't solve this yet, I thought to read all the JSON as text and replace all "\" with "/" but pyspark fail to read it as te...

Data Engineering

9218 Views
4 replies
2 kudos

10-17-2021 4:55:24 AM

View Replies

Latest Reply

Anonymous
Not applicable

11-11-2021 8:48:53 AM

2 kudos

@orian hindi - Would you be happy to post the solution you came up with and then mark it as best? That will help other members.

2 kudos

11-11-2021 8:48:53 AM

3 More Replies

Databricks Community

Resolved! How to read JSON files embedded in a list of lists?

How to efficiently process a 50Gb JSON file and store it in Delta?

Resolved! when reading Json file into DF , want to see data into rows wise, What be the solution

Getting Executor lost due to stage failure error on writing data frame to a delta table or any file like parquet or csv or avro

Photon engine throws error "JSON document exceeded maximum allowed size 400.0 MiB"

Resolved! Create and update a csv/json file in ADLSG2 with Eventhub in Databricks streaming

Flatten a complex JSON file and load into a delta table

Resolved! How to load a json file in pyspark with colon character in file name

Resolved! Read JSON with backslash.