cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

Serving GPU Endpoint, can't find CUDA

kfab
New Contributor II

Hi everyone !
I'm encountering an issue while trying to serve my model on a GPU endpoint.
My model is using deespeed that needs I got the following error :

 

"An error occurred while loading the model. CUDA_HOME does not exist, unable to compile CUDA op(s)."

 

Not having access to the endpoint through a terminal makes it hard to debug the issue.
On the personal compute that I used to registered and test the model, cuda is installed and the model is working fine. Cuda is installed in /usr/local/cuda as it is mentioned in the documentation.

But on the endpoint it seems that it is not the case.

I first tried to set-up CUDA_HOME environment variable manually to '/usr/local/cuda' hoping it would work but it didn't. I got the following error :

 

"[Errno 2] No such file or directory: '/usr/local/cuda/bin/nvcc"

 

Now I'm starting to wondering if the endpoint computes do have CUDA installed, which would be weird if not right?

I runned this command from my model loading method to check if it could be installed eslswere but it returned nothing :

 

print(os.popen("ls -l /usr/local/").read())
print(os.popen("ls -l /opt/").read())
print(os.popen("nvcc --version").read())
print(os.popen("which nvcc").read())

 

[86bb6k8gpl] ls: cannot access '/usr/local/cuda': No such file or directory
[86bb6k8gpl] /bin/sh: 1: nvcc: not found

I'm pretty new to databricks so I may be missing something obvious, maybe it is installed to a custom location but hard to find it print by print.
Any help would be appreciated 😅

1 REPLY 1

kfab
New Contributor II

Hi @Retired_mod ,

thanks for your reply !

I managed to install Cuda via conda 👍

Also I was wondering, is there any way to ssh to the serving endpoint?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group