Hi @lawrence009, Spill during DeltaOptimizeWrite can occur due to various reasons
- Possible issue: encountering a Java Heap space issue
- Troubleshooting steps:
• Clarify the issue and collect details (Notebook URL, Cluster URL, Consent to run commands, Time duration, Executor log)
• Identify the problem through Spark UI (look for Java.lang.outOfMemoryError: Java heap space)
• Check driver logs for error messages (e.g., Spark Connector Worker: hit upload error)
• Check executor logs for error message (in spark-executor/ip=<ip_address of the worker>/<executorId>/log4j file)
• Analyze the stack trace to identify problematic steps in the code
• Try a workaround if stack trace shows com.esotericsoftware.kryo.KryoException: java.lang.NegativeArraySizeException Serialization trace (increase spark.kryoserializer.buffer.max.mb)
• Implement the solution by increasing spark.kryoserializer.buffer.max.mb as per requirement (refer to Spark Configuration documentation)