Re: Out of Memory after adding distinct operation

Avinash_Narala · ‎01-25-2025

overhead memory (used for things like off-heap storage and shuffle operations) is separate from spark.executor.memory.

Let's break this down clearly:

Each executor's total memory consists of the following components:

The total memory allocated per executor is the sum of the two:

Total Executor Memory = spark.executor.memory + spark.executor.memoryOverhead

Given the setup you described:

Machine Memory: 16GB per worker node.
Spark Executor Memory: 7.6GB per executor (spark.executor.memory).
Available for Overhead: The remaining memory on the machine after accounting for spark.executor.memory.

Let’s calculate the breakdown:

Executor JVM Memory: 7.6GB is reserved for spark.executor.memory.
Overhead Memory:
- By default, spark.executor.memoryOverhead is max(384MB, 0.1 * spark.executor.memory), i.e., max(384MB, 0.76GB) = 0.76GB in your case.
- This leaves 16GB - (7.6GB + 0.76GB) = ~7.64GB for the OS, YARN, or other processes

Mark it as solution if this helps.

Regards,

Avinash N