cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Serving GPU Endpoint, can't find CUDA

kfab
New Contributor II

Hi everyone !
I'm encountering an issue while trying to serve my model on a GPU endpoint.
My model is using deespeed that needs I got the following error :

 

"An error occurred while loading the model. CUDA_HOME does not exist, unable to compile CUDA op(s)."

 

Not having access to the endpoint through a terminal makes it hard to debug the issue.
On the personal compute that I used to registered and test the model, cuda is installed and the model is working fine. Cuda is installed in /usr/local/cuda as it is mentioned in the documentation.

But on the endpoint it seems that it is not the case.

I first tried to set-up CUDA_HOME environment variable manually to '/usr/local/cuda' hoping it would work but it didn't. I got the following error :

 

"[Errno 2] No such file or directory: '/usr/local/cuda/bin/nvcc"

 

Now I'm starting to wondering if the endpoint computes do have CUDA installed, which would be weird if not right?

I runned this command from my model loading method to check if it could be installed eslswere but it returned nothing :

 

print(os.popen("ls -l /usr/local/").read())
print(os.popen("ls -l /opt/").read())
print(os.popen("nvcc --version").read())
print(os.popen("which nvcc").read())

 

[86bb6k8gpl] ls: cannot access '/usr/local/cuda': No such file or directory
[86bb6k8gpl] /bin/sh: 1: nvcc: not found

I'm pretty new to databricks so I may be missing something obvious, maybe it is installed to a custom location but hard to find it print by print.
Any help would be appreciated 😅

1 REPLY 1

kfab
New Contributor II

Hi @Retired_mod ,

thanks for your reply !

I managed to install Cuda via conda 👍

Also I was wondering, is there any way to ssh to the serving endpoint?

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now