cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
cancel
Showing results for 
Search instead for 
Did you mean: 

NVIDIA driver update

ravi-kolluri_in
New Contributor II

I want to update the cuda driver for the NVIDIA tesla T4 GPU on the cluster. 

using the following command

%sh
sudo apt-get --purge remove "*nvidia*"
sudo /usr/bin/nvidia-uninstall
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda-repo-ubuntu2204-11-8-local_11.8.0-520.61.05-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-11-8-local_11.8.0-520.61.05-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-11-8-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda
 
It is stuck with out returning any result. 
Please help.
2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @ravi-kolluri_in Please ensure you've selected the appropriate version for your machine and the Databricks Runtime version being used.

 

@Kaniz 

The DBR runtime compatible  cuda version (11.3) is DBR 11.X. unfortunately pyspark.ml.torch.distributed works only with DBR 13.X

So going back to DBR 11.X is not solving the problem