Executors getting killed while Scaling Spark jobs on GPU using RAPIDS(NVIDIA)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-16-2025 02:49 AM
Hi Team ,
I want to take advantage of Spark Distribution over GPU clusters using RAPID(NVIDIA) , everything is setup
1. The Jar is loaded correctly via Init script , the jar is downloaded and uploaded on volume (workspace is unity enabled) and via Init script uploaded to databricks jar location
src="/Volumes/ml_apps_ml_dev/volumes/team-volume-ml_apps_nonprod/rapids-4-spark_2.12-25.04.0.jar"
DEST="/databricks/jars/rapids-4-spark_2.12-25.04.0.jar"
cluster that I am using
Spark configuration that I am using
After all this configuration I can see GPU optimizations kick in Query Execution Plan as below but when I run the spark join like join , the executors are getting killed and the spark job is stuck , kindly please help