topic Re: Troubleshooting Spill in Data Engineering

Troubleshooting Spill

lawrence009 — Thu, 07 Sep 2023 06:16:27 GMT

I am trying to troubleshoot why spill occurred during DeltaOptimizeWrite. I am running a 64-core cluster with 256 GB RAM, which I expect to be handle this amount data (see attached DAG).

Re: Troubleshooting Spill

Finleycartwrigh — Thu, 07 Sep 2023 06:17:40 GMT

Data Skewness: Some tasks might be processing more data than others. Incorrect Resource Allocation: Ensure that Spark configurations (like spark.executor.memory, spark.core etc.) are set appropriately. Complex Computations: The operations in the DAG might be too complex, causing excessive memory usage.

Re: Troubleshooting Spill

Tharun-Kumar — Thu, 07 Sep 2023 09:42:00 GMT

@lawrence009

You can also take a look at the individual task level metrics. This should help in understanding whether there was skew involved during the processing. We can also get a better understanding of the spill by viewing the Task Level Summary. We record aggregated informations at min, 25th, 50th, 75th and max percentiles.

Re: Troubleshooting Spill

jose_gonzalez — Fri, 08 Sep 2023 22:46:17 GMT

You can resolver the Spill to memory by increasing the shuffle partitions, but 16 GB of spill memory should not create a major impact of your job execution. Could you share more details on the actual source code that you are running?