<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to implement merge multiple rows in single row with array and do not result in OOM? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-implement-merge-multiple-rows-in-single-row-with-array/m-p/57912#M30956</link>
    <description>&lt;P&gt;Is there any solution to this,&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/52547"&gt;@MarsSu&lt;/a&gt;&amp;nbsp; were you able to solve this, kindly shed some light on this if you resolve this.&lt;/P&gt;</description>
    <pubDate>Fri, 19 Jan 2024 20:05:15 GMT</pubDate>
    <dc:creator>917074</dc:creator>
    <dc:date>2024-01-19T20:05:15Z</dc:date>
    <item>
      <title>How to implement merge multiple rows in single row with array and do not result in OOM?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-implement-merge-multiple-rows-in-single-row-with-array/m-p/2695#M23</link>
      <description>&lt;P&gt;Hi, Everyone.&lt;/P&gt;&lt;P&gt;Currently I try to implement spark structured streaming with Pyspark. And I would like to merge multiple rows in single row with array and sink to downstream message queue for another service to use. Related example can follow as:&lt;/P&gt;&lt;P&gt;* Before&lt;/P&gt;&lt;P&gt;| col1  |&lt;/P&gt;&lt;P&gt;| {"a": 1, "b": 2} | &lt;/P&gt;&lt;P&gt;| {"a": 2, "b": 3} | &lt;/P&gt;&lt;P&gt;* After&lt;/P&gt;&lt;P&gt;| col1  |&lt;/P&gt;&lt;P&gt;| [{"a": 1, "b": 2}, {"a": 2, "b": 3}] | &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;After I survey, can call `collect_list()` to process it. But this function will collect data to driver, so it have some risk of resulting driver node OOM. Especially, I also observe out spark structured streaming application in Databricks job metrics. Indeed have driver memory usage keep increasing and occurs OOM errors.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Based on this scenario, could we have a better solution to solve this and avoid driver node OOM at the same time? If you have any ideas, please share it. I will be appreciate it.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 23 Jun 2023 01:46:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-implement-merge-multiple-rows-in-single-row-with-array/m-p/2695#M23</guid>
      <dc:creator>MarsSu</dc:creator>
      <dc:date>2023-06-23T01:46:06Z</dc:date>
    </item>
    <item>
      <title>Re: How to implement merge multiple rows in single row with array and do not result in OOM?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-implement-merge-multiple-rows-in-single-row-with-array/m-p/2696#M24</link>
      <description>&lt;P&gt;Hi @Mars Su​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Great to meet you, and thanks for your question! &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Let's see if your peers in the community have an answer to your question. Thanks.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 23 Jun 2023 07:16:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-implement-merge-multiple-rows-in-single-row-with-array/m-p/2696#M24</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-06-23T07:16:01Z</dc:date>
    </item>
    <item>
      <title>Re: How to implement merge multiple rows in single row with array and do not result in OOM?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-implement-merge-multiple-rows-in-single-row-with-array/m-p/2697#M25</link>
      <description>&lt;P&gt;Dear @Vidula Khanna​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks for your help. Hope we have a solution to solve it, thanks.&lt;/P&gt;</description>
      <pubDate>Sat, 24 Jun 2023 01:06:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-implement-merge-multiple-rows-in-single-row-with-array/m-p/2697#M25</guid>
      <dc:creator>MarsSu</dc:creator>
      <dc:date>2023-06-24T01:06:44Z</dc:date>
    </item>
    <item>
      <title>Re: How to implement merge multiple rows in single row with array and do not result in OOM?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-implement-merge-multiple-rows-in-single-row-with-array/m-p/57912#M30956</link>
      <description>&lt;P&gt;Is there any solution to this,&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/52547"&gt;@MarsSu&lt;/a&gt;&amp;nbsp; were you able to solve this, kindly shed some light on this if you resolve this.&lt;/P&gt;</description>
      <pubDate>Fri, 19 Jan 2024 20:05:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-implement-merge-multiple-rows-in-single-row-with-array/m-p/57912#M30956</guid>
      <dc:creator>917074</dc:creator>
      <dc:date>2024-01-19T20:05:15Z</dc:date>
    </item>
  </channel>
</rss>

