Troubleshooting Spill

lawrence009
Contributor

I am trying to troubleshoot why spill occurred during DeltaOptimizeWrite. I am running a 64-core cluster with 256 GB RAM, which I expect to be handle this amount data (see attached DAG).

IMG_1085.jpeg

Finleycartwrigh
New Contributor II

Data Skewness: Some tasks might be processing more data than others. Incorrect Resource Allocation: Ensure that Spark configurations (like spark.executor.memory, spark.core etc.) are set appropriately. Complex Computations: The operations in the DAG might be too complex, causing excessive memory usage.

Tharun-Kumar
Databricks Employee
Databricks Employee

@lawrence009 

You can also take a look at the individual task level metrics. This should help in understanding whether there was skew involved during the processing. We can also get a better understanding of the spill by viewing the Task Level Summary. We record aggregated informations at min, 25th, 50th, 75th and max percentiles.

jose_gonzalez
Databricks Employee
Databricks Employee

You can resolver the Spill to memory by increasing the shuffle partitions, but 16 GB of spill memory should not create a major impact of your job execution. Could you share more details on the actual source code that you are running?