cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Troubleshooting Spill

lawrence009
Contributor

I am trying to troubleshoot why spill occurred during DeltaOptimizeWrite. I am running a 64-core cluster with 256 GB RAM, which I expect to be handle this amount data (see attached DAG).

IMG_1085.jpeg

3 REPLIES 3

Finleycartwrigh
New Contributor II

Data Skewness: Some tasks might be processing more data than others. Incorrect Resource Allocation: Ensure that Spark configurations (like spark.executor.memory, spark.core etc.) are set appropriately. Complex Computations: The operations in the DAG might be too complex, causing excessive memory usage.

Tharun-Kumar
Databricks Employee
Databricks Employee

@lawrence009 

You can also take a look at the individual task level metrics. This should help in understanding whether there was skew involved during the processing. We can also get a better understanding of the spill by viewing the Task Level Summary. We record aggregated informations at min, 25th, 50th, 75th and max percentiles.

jose_gonzalez
Databricks Employee
Databricks Employee

You can resolver the Spill to memory by increasing the shuffle partitions, but 16 GB of spill memory should not create a major impact of your job execution. Could you share more details on the actual source code that you are running?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group