cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Spark Memory Configurationโ€“ Request for Clarification

sowanth
New Contributor II

Hi Team,
I have noticed the following Spark configuration is being applied, though it's not defined in our repo or anywhere in the policies:

spark.memory.offHeap.enabled = true  
spark.memory.offHeap.size = Around 3/4 of the node instance memory (i.e 1-3X of executor memory)

This setup leaves around only 1/4 of the node's memory for executor allocation. While we can override this config setting in our own spark configuration but not sure how it is set.

Such large off-heap allocation is rarely needed for our case.

1, Do you have any specific recommendations to use these much off-heap memory?
2, May I know where the off-heap memory config is set in the Databricks cluster? Additionally, could you explain the rational behind allocating more off-heap memory than executor memory in this strategy?

Databricks Runtime version: 12.2 LTS (includes Apache Spark 3.3.2, Scala 2.12) and 13.3 LTS

Thanks & Regards,
Sowanth

3 REPLIES 3

Advika
Databricks Employee
Databricks Employee

Hello @sowanth!

Off-heap memory is automatically configured on some clusters to improve stability and reduce Java garbage collection issues, particularly for Photon or heavy caching workloads. This setting isnโ€™t coming from your repo or policies but is applied at the cluster level. If your Spark jobs donโ€™t require this much off-heap memory, you can adjust it by overriding spark.memory.offHeap.enabled and spark.memory.offHeap.size in the clusterโ€™s Spark configuration.

https://kb.databricks.com/en_US/clusters/spark-executor-memory

sowanth
New Contributor II

Hi @Advika,
Thanks for the details and much appreciate. 
Yes, I already referred this document but I don't find anywhere how much benefit based on this default higher offHeap memory on these node types and benchmark details for the caching or other workloads.

Regards,

Sowanth

sowanth
New Contributor II

Now I understand how it's automatically configured in our cluster along with the rationale behind this off-heap memory approach.

However, I have some concerns about this configuration:

  1. General applicability: Most jobs don't actually require 70% off-heap memory allocation
  2. Industry recommendations: Leading LLM models (Claude, GPT, DeepSeek AI) don't recommend such high off-heap memory usage. Suggesting very very less % that is from the executor memory.
  3. Lack of benchmarks: I haven't found any test results or benchmarks supporting this configuration for caching or other workloads, even for GC optimization
  4. Cost implications: While this might help in some edge cases, it doesn't seem beneficial for general use cases and could be significantly increasing our costs

Could you please share any benchmark data or test results you have for this specific job configuration? This would help us better understand the performance benefits versus the cost impact.

Best regards,
Sowanth

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now