Re: Relevance of off heap memory and usage

Vidhi_Khaitan · ‎05-13-2025

Hi team,

Answering your questions below -
spark.executor.memoryOverhead: This refers to additional memory allocated for each executor beyond the JVM heap (spark.executor.memory). In short, used for for JVM-related overheads.
1) JVM overhead, including metadata and garbage collection (GC) overheads.
2) Spark's internal data structures, such as task metadata and shuffle buffers.
3) Python interpreter memory in case of PySpark usage.

spark.offHeap.size: This defines the amount of off-heap memory allocated for Spark executors. Off-heap memory exists outside the JVM heap and is often used for storing large contiguous blocks of data (e.g., shuffle data or intermediate results), avoiding GC overheads.

Operations where Spark uses off-heap memory ->
Caching large datasets: Spark may store datasets in off-heap memory to reduce JVM heap memory pressure.
Shuffle operations: Off-heap memory can be used to handle large shuffle operations to minimize GC pressure.
Sorting and aggregations: Results of large-scale sorting or aggregation operations may use off-heap memory.

If spark.memory.offHeap.enabled is set to false, it disables only the spark.offHeap.size memory allocation. However, spark.executor.memoryOverhead remains unaffected, as it is used for JVM-related overheads and other Spark processes

I hope I have answered your questions!