<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Spark Memory Configuration– Request for Clarification in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/spark-memory-configuration-request-for-clarification/m-p/128081#M48151</link>
    <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/178566"&gt;@sowanth&lt;/a&gt;!&lt;/P&gt;
&lt;P&gt;Off-heap memory is automatically configured on some clusters to improve stability and reduce Java garbage collection issues, particularly for Photon or heavy caching workloads. This setting isn’t coming from your repo or policies but is applied at the cluster level. If your Spark jobs don’t require this much off-heap memory, you can adjust it by overriding spark.memory.offHeap.enabled and spark.memory.offHeap.size in the cluster’s Spark configuration.&lt;/P&gt;
&lt;P&gt;&lt;A href="https://kb.databricks.com/en_US/clusters/spark-executor-memory" target="_self"&gt;https://kb.databricks.com/en_US/clusters/spark-executor-memory&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 11 Aug 2025 15:25:29 GMT</pubDate>
    <dc:creator>Advika</dc:creator>
    <dc:date>2025-08-11T15:25:29Z</dc:date>
    <item>
      <title>Spark Memory Configuration– Request for Clarification</title>
      <link>https://community.databricks.com/t5/data-engineering/spark-memory-configuration-request-for-clarification/m-p/127836#M48100</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Hi Team,&lt;BR /&gt;I have noticed the following Spark configuration is being applied, though it's not defined in our repo or anywhere in the policies:&lt;BR /&gt;&lt;BR /&gt;spark.memory.offHeap.enabled = true &amp;nbsp;&lt;BR /&gt;spark.memory.offHeap.size = Around 3/4 of the node instance memory (i.e 1-3X of executor memory)&lt;BR /&gt;&lt;BR /&gt;This setup leaves around only 1/4 of the node's memory for executor allocation. While we can override this config setting in our own spark configuration but not sure how it is set.&lt;BR /&gt;&lt;BR /&gt;Such large off-heap allocation is rarely needed for our case.&lt;BR /&gt;&lt;BR /&gt;1, Do you have any specific recommendations to use these much off-heap memory?&lt;BR /&gt;2, May I know where the off-heap memory config is set in the Databricks cluster? Additionally, could you explain the rational behind allocating more off-heap memory than executor memory in this strategy?&lt;BR /&gt;&lt;BR /&gt;Databricks Runtime version:&amp;nbsp;12.2 LTS (includes Apache Spark 3.3.2, Scala 2.12) and 13.3 LTS&lt;BR /&gt;&lt;BR /&gt;Thanks &amp;amp; Regards,&lt;BR /&gt;Sowanth&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 08 Aug 2025 16:31:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/spark-memory-configuration-request-for-clarification/m-p/127836#M48100</guid>
      <dc:creator>sowanth</dc:creator>
      <dc:date>2025-08-08T16:31:36Z</dc:date>
    </item>
    <item>
      <title>Re: Spark Memory Configuration– Request for Clarification</title>
      <link>https://community.databricks.com/t5/data-engineering/spark-memory-configuration-request-for-clarification/m-p/128081#M48151</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/178566"&gt;@sowanth&lt;/a&gt;!&lt;/P&gt;
&lt;P&gt;Off-heap memory is automatically configured on some clusters to improve stability and reduce Java garbage collection issues, particularly for Photon or heavy caching workloads. This setting isn’t coming from your repo or policies but is applied at the cluster level. If your Spark jobs don’t require this much off-heap memory, you can adjust it by overriding spark.memory.offHeap.enabled and spark.memory.offHeap.size in the cluster’s Spark configuration.&lt;/P&gt;
&lt;P&gt;&lt;A href="https://kb.databricks.com/en_US/clusters/spark-executor-memory" target="_self"&gt;https://kb.databricks.com/en_US/clusters/spark-executor-memory&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 11 Aug 2025 15:25:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/spark-memory-configuration-request-for-clarification/m-p/128081#M48151</guid>
      <dc:creator>Advika</dc:creator>
      <dc:date>2025-08-11T15:25:29Z</dc:date>
    </item>
    <item>
      <title>Re: Spark Memory Configuration– Request for Clarification</title>
      <link>https://community.databricks.com/t5/data-engineering/spark-memory-configuration-request-for-clarification/m-p/128322#M48208</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/152834"&gt;@Advika&lt;/a&gt;,&lt;BR /&gt;&lt;SPAN&gt;Thanks for the details and much appreciate.&amp;nbsp;&lt;BR /&gt;&lt;/SPAN&gt;Yes, I already referred this document but I don't find anywhere how much benefit based on this default higher offHeap memory on these node types and benchmark details for the caching or other workloads.&lt;BR /&gt;&lt;BR /&gt;Regards,&lt;/P&gt;&lt;P&gt;Sowanth&lt;/P&gt;</description>
      <pubDate>Wed, 13 Aug 2025 11:23:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/spark-memory-configuration-request-for-clarification/m-p/128322#M48208</guid>
      <dc:creator>sowanth</dc:creator>
      <dc:date>2025-08-13T11:23:17Z</dc:date>
    </item>
    <item>
      <title>Re: Spark Memory Configuration– Request for Clarification</title>
      <link>https://community.databricks.com/t5/data-engineering/spark-memory-configuration-request-for-clarification/m-p/128345#M48213</link>
      <description>&lt;P&gt;Now I understand how it's automatically configured in our cluster along with the rationale behind this off-heap memory approach.&lt;/P&gt;&lt;P&gt;However, I have some concerns about this configuration:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;General applicability: Most jobs don't actually require 70% off-heap memory allocation&lt;/LI&gt;&lt;LI&gt;Industry recommendations: Leading LLM models (Claude, GPT, DeepSeek AI) don't recommend such high off-heap memory usage. Suggesting very very less % that is from the executor memory.&lt;/LI&gt;&lt;LI&gt;Lack of benchmarks: I haven't found any test results or benchmarks supporting this configuration for caching or other workloads, even for GC optimization&lt;/LI&gt;&lt;LI&gt;Cost implications: While this might help in some edge cases, it doesn't seem beneficial for general use cases and could be significantly increasing our costs&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;Could you please share any benchmark data or test results you have for this specific job configuration? This would help us better understand the performance benefits versus the cost impact.&lt;BR /&gt;&lt;BR /&gt;Best regards,&lt;BR /&gt;Sowanth&lt;/P&gt;</description>
      <pubDate>Wed, 13 Aug 2025 13:26:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/spark-memory-configuration-request-for-clarification/m-p/128345#M48213</guid>
      <dc:creator>sowanth</dc:creator>
      <dc:date>2025-08-13T13:26:22Z</dc:date>
    </item>
  </channel>
</rss>

