05-05-2025 10:52 PM
I have a pyspark job reading the input data volume of just ~50-55GB Parquet data from a delta table on Databricks. Job is using n2-highmem-4 GCP VM and 1-15 worker with autoscaling on databricks. Each workerVM of type n2-highmem-4 has 32GB memory and 4 cores. Each VM has one executor. 22GB is allocated per executor. ie 22*15=330GB overall executor memory, which seems to be large enough for ~55GB input data. shuffle partition is set to 200. But Im getting OOM error.
Could you please help understand why this is not sufficient leading to oom? Also is it necessary for all ~13 partitions assigned to an executor to fit in memory at once or since only 4 tasks run in parallel per executor, is it sufficient for memory to accommodate just 4 partitions at a time?
05-06-2025 01:04 PM
python
spark.conf.set("spark.sql.adaptive.enabled", "true")
skew
hints to handle skewed joins or aggregations.0.6 * (availableMemory - reservedMemory)
, where approximately 13 GB per executor is available for execution and storage tasks. If tasks for large partitions require more memory, spilling to disk occurs, which can lead to OOM errors.python
spark.conf.set("spark.memory.fraction", "0.8") # Adjusts execution memory fraction
explain()
function) to understand how transformations and actions are executed. An optimal logical and physical plan helps avoid performance bottlenecks.05-06-2025 11:31 PM - edited 05-06-2025 11:33 PM
Thank you so much for the detailed response, much appreciate. Two followup question
05-07-2025 04:18 AM
We will get back to you shortly.
05-07-2025 06:07 AM
You need to enable more Metrics. Click on the below hotlink and turn on all Metrics.
05-07-2025 06:09 AM
Opps. That's the wrong pic. Here's the correct one.
05-07-2025 06:35 AM
I'm guessing you are running one of more Wide transformations in your query and that is causing Skewed Shuffle Partitions. Go back to Stages tab and check out 'Shuffle Write Size/Records' row.
05-07-2025 06:40 AM
I'm guessing you have Shuffle Write Sizes that are > 1GB. That's when things start going down the rathole with things like Spill and OOM. Here's a few questions I have for your. Is Adaptive Query Execution enabled? Also I say in your earlier screen shot you had some nasty Java Garbage Collection. Is your Cluster Photon-enabled? This can reduce the JGC.
05-07-2025 06:43 AM
Other things to consider. By any chance do you have Spot instances of Workers turned on (edge case)? I've seen where this hand-cuffs AQE. If have join, do you have the smaller table as the first table in the JOIN? Are you ANALYZE TABLE which can change the Join strategy to one that won't go OOM? These are some things to consider.
05-07-2025 08:06 AM
Much appreciate @mark_ott and @BigRoux for the prompt response.
The job uses below cluster/settings.
I just enabled 'Show Additional Metrics' on stage and attaching both job/stage/task details from Spark UI. Only single job and stage has failed. There is no shuffle write. Is not AQE enabled be default on Spark 3 onwards?
05-08-2025 06:04 AM
OK, without having your code or DAG, it's a little difficult to figure this out. But here's something that should work. First, figure out who many Memory Partitions you have. Apparently, your Memory Partitions are too big for the cluster, hence the OOM. Use this generic code as a template.
num_partitions = df.rdd.getNumPartitions() print(num_partitions)
05-08-2025 06:08 AM
Next, use the repartition(n) to increase your dataframe to twice the number you got earlier. For example, if num_partitions was 30, then repartition(60) prior to running your query. With half the data in each Memory Partition, I'm guessing you won't OOM. If you still do, increase the number by x2 again until the OOM disappears..
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now