Data Engineering

Forum Posts

Sorted by:

by MarsSu • New Contributor II

06-22-2023 6:46:06 PM

10600 Views
3 replies
0 kudos

How to implement merge multiple rows in single row with array and do not result in OOM?

Hi, Everyone.Currently I try to implement spark structured streaming with Pyspark. And I would like to merge multiple rows in single row with array and sink to downstream message queue for another service to use. Related example can follow as:* Befor...

Data Engineering

10600 Views
3 replies
0 kudos

06-22-2023 6:46:06 PM

View Replies

Latest Reply

917074
New Contributor II

01-19-2024 12:05:15 PM

0 kudos

Is there any solution to this, @MarsSu were you able to solve this, kindly shed some light on this if you resolve this.

0 kudos

01-19-2024 12:05:15 PM

2 More Replies

by Satty • New Contributor

05-25-2023 2:57:30 AM

7362 Views
1 replies
0 kudos

Solution for ConnectException error: This is often caused by an OOM error that causes the connection to the Python REPL to be closed. Check your query's memory usage.

When ever I am trying to run and load multiple files in single dataframe for processing (overall file size is more than 15 gb in single dataframe at the end of the loop, my code is crashing everytime with the below error...ConnectException error: Thi...

Data Engineering

7362 Views
1 replies
0 kudos

05-25-2023 2:57:30 AM

View Replies

Latest Reply

pvignesh92
Honored Contributor

05-26-2023 3:27:55 AM

0 kudos

@Satish Agarwal It seems your system memory is not sufficient to load the 15GB file. I believe you are using Python Pandas data frame for loading 15GB file and not using Spark. Is there any particular reason that you cannot use Spark for this.

0 kudos

05-26-2023 3:27:55 AM

by James_209101 • New Contributor II

10-20-2022 5:33:19 AM

9512 Views
2 replies
5 kudos

Using large dataframe in-memory (data not allowed to be "at rest") results in driver crash and/or out of memory

I'm having trouble working on Databricks with data that we are not allowed to save off or persist in any way. The data comes from an API (which returns a JSON response). We have a scala package on our cluster that makes the queries (almost 6k queries...

Data Engineering

9512 Views
2 replies
5 kudos

10-20-2022 5:33:19 AM

View Replies

Latest Reply

Anonymous
Not applicable

11-27-2022 5:47:52 AM

5 kudos

Hi @James Held Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

5 kudos

11-27-2022 5:47:52 AM

1 More Replies

by User16752245312 • Databricks Employee

06-07-2021 3:02:32 PM

6539 Views
2 replies
2 kudos

How can I automatically capture the heap dump on the driver and executors in the event of an OOM error?

If you have a job that repeatedly run into Out-of-memory error (OOM) either on the driver or executors, automatically capture the heap dump on OOM event will help debugging the memory issue and identify the cause of the error.Spark config:spark.execu...

Data Engineering

6539 Views
2 replies
2 kudos

06-07-2021 3:02:32 PM

View Replies

Latest Reply

John_360
New Contributor II

08-09-2022 3:16:03 PM

2 kudos

Is it necessary to use exactly that HeapDumpPath? I find I'm unable to get driver heap dumps with a different path but otherwise the same configuration. I'm using spark_version 10.4.x-cpu-ml-scala2.12.

2 kudos

08-09-2022 3:16:03 PM

1 More Replies

by Rnmj • New Contributor III

10-25-2021 5:25:36 AM

15337 Views
3 replies
6 kudos

ConnectException: Connection refused (Connection refused) This is often caused by an OOM error

I am trying to run a python code where a json file is flattened to pipe separated file . The code works with smaller files but for huge files of 2.4 GB I get below error:ConnectException: Connection refused (Connection refused)Error while obtaining a...

Data Engineering

15337 Views
3 replies
6 kudos

10-25-2021 5:25:36 AM

View Replies

Latest Reply

Rnmj
New Contributor III

10-28-2021 8:58:14 PM

6 kudos

Hi @Jose Gonzalez , @Werner Stinckens @Kaniz Fatma ,Thanks for your response .Appreciate a lot. The issue was in the code, it was a python /panda code running on Spark. Due to this only driver node was being used. i did validate this by increasin...

6 kudos

10-28-2021 8:58:14 PM

2 More Replies

Databricks Community

How to implement merge multiple rows in single row with array and do not result in OOM?

Solution for ConnectException error: This is often caused by an OOM error that causes the connection to the Python REPL to be closed. Check your query's memory usage.

Using large dataframe in-memory (data not allowed to be "at rest") results in driver crash and/or out of memory

How can I automatically capture the heap dump on the driver and executors in the event of an OOM error?

ConnectException: Connection refused (Connection refused) This is often caused by an OOM error