cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Executors getting killed while Scaling Spark jobs on GPU using RAPIDS(NVIDIA)

rajanchaturvedi
New Contributor

Hi Team , 

I want to take advantage of Spark Distribution over GPU clusters using RAPID(NVIDIA) , everything is setup 

1. The Jar is loaded correctly via Init script , the jar is downloaded and uploaded on volume (workspace is unity enabled) and via Init script uploaded to databricks jar location  


src="/Volumes/ml_apps_ml_dev/volumes/team-volume-ml_apps_nonprod/rapids-4-spark_2.12-25.04.0.jar"


DEST="/databricks/jars/rapids-4-spark_2.12-25.04.0.jar"

cluster that I am using 

rajanchaturvedi_0-1750067083816.png

Spark configuration that I am using 

rajanchaturvedi_1-1750067171780.png

After all this configuration I can see GPU optimizations kick in Query Execution Plan as below but when I run the spark join like join , the executors are getting killed and the spark job is stuck , kindly please help

rajanchaturvedi_2-1750067287042.png

 




 

0 REPLIES 0