<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: SparkOutOfMemoryError when merging data into a table that already has data in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/sparkoutofmemoryerror-when-merging-data-into-a-table-that/m-p/82690#M36720</link>
    <description>&lt;P&gt;Hello Kaniz_Fatma,&amp;nbsp;&lt;/P&gt;&lt;P&gt;The problem wasn't anything related to listed things up here, it was bad data modelling and how relation inside the table was created. Remodelling data helped&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 12 Aug 2024 05:59:15 GMT</pubDate>
    <dc:creator>vannipart</dc:creator>
    <dc:date>2024-08-12T05:59:15Z</dc:date>
    <item>
      <title>SparkOutOfMemoryError when merging data into a table that already has data</title>
      <link>https://community.databricks.com/t5/data-engineering/sparkoutofmemoryerror-when-merging-data-into-a-table-that/m-p/79097#M35673</link>
      <description>&lt;P&gt;Hello,&amp;nbsp;&lt;/P&gt;&lt;P&gt;There is an issue with merging data from a dataframe into a table&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;2024 databricksJob aborted due to stage failure: Task 17 in stage 1770.0 failed 4 times, most recent failure: Lost task 17.3 in stage 1770.0 (TID 1669) (1x.xx.xx.xx executor 8): org.apache.spark.memory.SparkOutOfMemoryError: [UNABLE_TO_ACQUIRE_MEMORY] Unable to acquire 28 bytes of memory, got 0.&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;There script:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;df.&lt;/SPAN&gt;&lt;SPAN&gt;createOrReplaceTempView&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"df_re"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;%sql&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;MERGE&lt;/SPAN&gt; &lt;SPAN&gt;INTO&lt;/SPAN&gt; &lt;SPAN&gt;catalog&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;schema&lt;/SPAN&gt;&lt;SPAN&gt;.table&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;target&lt;/SPAN&gt; &lt;SPAN&gt;USING&lt;/SPAN&gt;&lt;SPAN&gt; df_re source&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;ON&lt;/SPAN&gt; &lt;SPAN&gt;target&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;DB_ID&lt;/SPAN&gt; &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;source&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;DB_ID&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;WHEN&lt;/SPAN&gt; &lt;SPAN&gt;MATCHED&lt;/SPAN&gt; &lt;SPAN&gt;THEN&lt;/SPAN&gt; &lt;SPAN&gt;UPDATE&lt;/SPAN&gt; &lt;SPAN&gt;SET&lt;/SPAN&gt; &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;WHEN&lt;/SPAN&gt; &lt;SPAN&gt;NOT&lt;/SPAN&gt; &lt;SPAN&gt;MATCHED&lt;/SPAN&gt; &lt;SPAN&gt;THEN&lt;/SPAN&gt; &lt;SPAN&gt;INSERT&lt;/SPAN&gt; &lt;SPAN&gt;*&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;The data amount is small like 200k rows or even smaller&lt;/P&gt;&lt;P&gt;"node_type_id": "Standard_D16as_v5"&lt;/P&gt;&lt;P&gt;"spark_version": "14.3.x-scala2.12"&lt;/P&gt;&lt;P&gt;Cluster has no sparks configurations-&amp;nbsp;&lt;/P&gt;&lt;P&gt;Unity catalog is in use and delta tables are in external location.&lt;/P&gt;&lt;P&gt;One thing is that the notebook that his merge is run has a lot of dataframes and other data transformations for creating this dataframe that is then create into a&amp;nbsp;&lt;SPAN&gt;TempView.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;It is a mystery and have no idea how to solve this, it is not a data issue, that is for sure.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Any tips and help is welcome&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 17 Jul 2024 10:16:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/sparkoutofmemoryerror-when-merging-data-into-a-table-that/m-p/79097#M35673</guid>
      <dc:creator>vannipart</dc:creator>
      <dc:date>2024-07-17T10:16:46Z</dc:date>
    </item>
    <item>
      <title>Re: SparkOutOfMemoryError when merging data into a table that already has data</title>
      <link>https://community.databricks.com/t5/data-engineering/sparkoutofmemoryerror-when-merging-data-into-a-table-that/m-p/82690#M36720</link>
      <description>&lt;P&gt;Hello Kaniz_Fatma,&amp;nbsp;&lt;/P&gt;&lt;P&gt;The problem wasn't anything related to listed things up here, it was bad data modelling and how relation inside the table was created. Remodelling data helped&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 12 Aug 2024 05:59:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/sparkoutofmemoryerror-when-merging-data-into-a-table-that/m-p/82690#M36720</guid>
      <dc:creator>vannipart</dc:creator>
      <dc:date>2024-08-12T05:59:15Z</dc:date>
    </item>
  </channel>
</rss>

