Databricks Community

lawrence009 · ‎09-06-2023

I am trying to troubleshoot why spill occurred during DeltaOptimizeWrite. I am running a 64-core cluster with 256 GB RAM, which I expect to be handle this amount data (see attached DAG).

Finleycartwrigh · ‎09-06-2023

Data Skewness: Some tasks might be processing more data than others. Incorrect Resource Allocation: Ensure that Spark configurations (like spark.executor.memory, spark.core etc.) are set appropriately. Complex Computations: The operations in the DAG might be too complex, causing excessive memory usage.

TellTims

Tharun-Kumar · ‎09-07-2023

@lawrence009

You can also take a look at the individual task level metrics. This should help in understanding whether there was skew involved during the processing. We can also get a better understanding of the spill by viewing the Task Level Summary. We record aggregated informations at min, 25th, 50th, 75th and max percentiles.

jose_gonzalez · ‎09-08-2023

You can resolver the Spill to memory by increasing the shuffle partitions, but 16 GB of spill memory should not create a major impact of your job execution. Could you share more details on the actual source code that you are running?

Databricks Community

Troubleshooting Spill

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! November 21 – 27, 2025

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples