Executors getting killed while Scaling Spark jobs ...

rajanchaturvedi · ‎06-16-2025

Hi Team ,

I want to take advantage of Spark Distribution over GPU clusters using RAPID(NVIDIA) , everything is setup

1. The Jar is loaded correctly via Init script , the jar is downloaded and uploaded on volume (workspace is unity enabled) and via Init script uploaded to databricks jar location

src="/Volumes/ml_apps_ml_dev/volumes/team-volume-ml_apps_nonprod/rapids-4-spark_2.12-25.04.0.jar"

DEST="/databricks/jars/rapids-4-spark_2.12-25.04.0.jar"

cluster that I am using

Spark configuration that I am using

After all this configuration I can see GPU optimizations kick in Query Execution Plan as below but when I run the spark join like join , the executors are getting killed and the spark job is stuck , kindly please help

Executors getting killed while Scaling Spark jobs on GPU using RAPIDS(NVIDIA)