cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to implement merge multiple rows in single row with array and do not result in OOM?

MarsSu
New Contributor II

Hi, Everyone.

Currently I try to implement spark structured streaming with Pyspark. And I would like to merge multiple rows in single row with array and sink to downstream message queue for another service to use. Related example can follow as:

* Before

| col1 |

| {"a": 1, "b": 2} |

| {"a": 2, "b": 3} |

* After

| col1 |

| [{"a": 1, "b": 2}, {"a": 2, "b": 3}] |

After I survey, can call `collect_list()` to process it. But this function will collect data to driver, so it have some risk of resulting driver node OOM. Especially, I also observe out spark structured streaming application in Databricks job metrics. Indeed have driver memory usage keep increasing and occurs OOM errors.

Based on this scenario, could we have a better solution to solve this and avoid driver node OOM at the same time? If you have any ideas, please share it. I will be appreciate it.

3 REPLIES 3

Anonymous
Not applicable

Hi @Mars Su​ 

Great to meet you, and thanks for your question!

Let's see if your peers in the community have an answer to your question. Thanks.

MarsSu
New Contributor II

Dear @Vidula Khanna​ ,

Thanks for your help. Hope we have a solution to solve it, thanks.

917074
New Contributor II

Is there any solution to this, @MarsSu  were you able to solve this, kindly shed some light on this if you resolve this.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group