<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Spark Out of Memory Error in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/spark-out-of-memory-error/m-p/79896#M35872</link>
    <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;, thanks for the detailed suggestions.&lt;/P&gt;&lt;P&gt;I believe the first reference relates to the issue; however, after adjusting &lt;EM&gt;spark.driver.maxResultSize&amp;nbsp;&lt;/EM&gt; to various&amp;nbsp; values - e.g., 10g, 20g, 30g - a new error ensues (see below).&lt;/P&gt;&lt;P&gt;The operation involves a&amp;nbsp;&lt;EM&gt;collect()&amp;nbsp;&lt;/EM&gt;on a Delta table with 380 MM rows and 5 columns (3.2GB, partitioned into 55 files). If the average row size is 48Bytes (per initial error), shouldn't 20GBytes be sufficient?&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;New Error&lt;/STRONG&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;EM&gt;The spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically reattached.&lt;/EM&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;EM&gt;at com.databricks.spark.chauffeur.Chauffeur.onDriverStateChange(Chauffeur.scala:1367)&lt;/EM&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 22 Jul 2024 15:27:58 GMT</pubDate>
    <dc:creator>leungi</dc:creator>
    <dc:date>2024-07-22T15:27:58Z</dc:date>
    <item>
      <title>Spark Out of Memory Error</title>
      <link>https://community.databricks.com/t5/data-engineering/spark-out-of-memory-error/m-p/79181#M35695</link>
      <description>&lt;P&gt;&lt;STRONG&gt;Background&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Using R language's {sparklyr} package to fetch data from tables in Unity Catalog, and faced the error below.&lt;/P&gt;&lt;P&gt;Tried the following, to no avail:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Using memory optimized cluster - e.g., E4d.&lt;/LI&gt;&lt;LI&gt;Using bigger (RAM) cluster - e.g., E8d.&lt;/LI&gt;&lt;LI&gt;Enable auto-scaling.&lt;/LI&gt;&lt;LI&gt;Setting spark config:&lt;UL&gt;&lt;LI&gt;spark.driver.maxResultSize 4096&lt;/LI&gt;&lt;LI&gt;spark.memory.offHeap.enabled true&lt;/LI&gt;&lt;LI&gt;spark.driver.memory 8082&lt;/LI&gt;&lt;LI&gt;spark.executor.instances 4&lt;/LI&gt;&lt;LI&gt;spark.memory.offHeap.size 7284&lt;/LI&gt;&lt;LI&gt;spark.executor.memory 7284&lt;/LI&gt;&lt;LI&gt;spark.executor.cores 4&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;Error&lt;/STRONG&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;EM&gt;Error : org.apache.spark.memory.SparkOutOfMemoryError: Total memory usage during row decode exceeds spark.driver.maxResultSize (4.0 GiB). The average row size was 48.0 B, with 2.9 GiB used for temporary buffers. Run `sparklyr::spark_last_error()` to see the full Spark error (multiple lines) To use the previous style of error message set `options("sparklyr.simple.errors" = TRUE)` Error:&lt;/EM&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;EM&gt;Error: ! org.apache.spark.memory.SparkOutOfMemoryError: Total memory usage during row decode exceeds spark.driver.maxResultSize (4.0 GiB). The average row size was 48.0 B, with 2.9 GiB used for temporary buffers. Run `sparklyr::spark_last_error()` to see the full Spark error (multiple lines) To use the previous style of error message set `options("sparklyr.simple.errors" = TRUE)`&lt;/EM&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Thu, 18 Jul 2024 06:40:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/spark-out-of-memory-error/m-p/79181#M35695</guid>
      <dc:creator>leungi</dc:creator>
      <dc:date>2024-07-18T06:40:29Z</dc:date>
    </item>
    <item>
      <title>Re: Spark Out of Memory Error</title>
      <link>https://community.databricks.com/t5/data-engineering/spark-out-of-memory-error/m-p/79896#M35872</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;, thanks for the detailed suggestions.&lt;/P&gt;&lt;P&gt;I believe the first reference relates to the issue; however, after adjusting &lt;EM&gt;spark.driver.maxResultSize&amp;nbsp;&lt;/EM&gt; to various&amp;nbsp; values - e.g., 10g, 20g, 30g - a new error ensues (see below).&lt;/P&gt;&lt;P&gt;The operation involves a&amp;nbsp;&lt;EM&gt;collect()&amp;nbsp;&lt;/EM&gt;on a Delta table with 380 MM rows and 5 columns (3.2GB, partitioned into 55 files). If the average row size is 48Bytes (per initial error), shouldn't 20GBytes be sufficient?&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;New Error&lt;/STRONG&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;EM&gt;The spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically reattached.&lt;/EM&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV class=""&gt;&lt;EM&gt;at com.databricks.spark.chauffeur.Chauffeur.onDriverStateChange(Chauffeur.scala:1367)&lt;/EM&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 22 Jul 2024 15:27:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/spark-out-of-memory-error/m-p/79896#M35872</guid>
      <dc:creator>leungi</dc:creator>
      <dc:date>2024-07-22T15:27:58Z</dc:date>
    </item>
  </channel>
</rss>

