<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Serving GPU Endpoint, can't find CUDA in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/serving-gpu-endpoint-can-t-find-cuda/m-p/61207#M6567</link>
    <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt; ,&lt;/P&gt;&lt;P&gt;thanks for your reply !&lt;/P&gt;&lt;P&gt;I managed to install Cuda via conda&amp;nbsp;&lt;span class="lia-unicode-emoji" title=":thumbs_up:"&gt;👍&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Also I was wondering, is there any way to ssh to the serving endpoint?&lt;/P&gt;</description>
    <pubDate>Tue, 20 Feb 2024 08:41:58 GMT</pubDate>
    <dc:creator>kfab</dc:creator>
    <dc:date>2024-02-20T08:41:58Z</dc:date>
    <item>
      <title>Serving GPU Endpoint, can't find CUDA</title>
      <link>https://community.databricks.com/t5/get-started-discussions/serving-gpu-endpoint-can-t-find-cuda/m-p/60203#M6565</link>
      <description>&lt;P&gt;Hi everyone !&lt;BR /&gt;I'm encountering an issue while trying to serve my model on a GPU endpoint.&lt;BR /&gt;My model is using deespeed that needs I got the following error :&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;"An error occurred while loading the model. CUDA_HOME does not exist, unable to compile CUDA op(s)."&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Not having access to the endpoint through a terminal makes it hard to debug the issue.&lt;BR /&gt;On the personal compute that I used to registered and test the model, cuda is installed and the model is working fine. Cuda is installed in /usr/local/cuda as it is mentioned in the documentation.&lt;/P&gt;&lt;P&gt;But on the endpoint it seems that it is not the case.&lt;/P&gt;&lt;P&gt;I first tried to set-up CUDA_HOME environment variable manually to '/usr/local/cuda' hoping it would work but it didn't. I got the following error :&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;"[Errno 2] No such file or directory: '/usr/local/cuda/bin/nvcc"&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Now I'm starting to wondering if the endpoint computes do have CUDA installed, which would be weird if not right?&lt;/P&gt;&lt;P&gt;I runned this command from my model loading method to check if it could be installed eslswere but it returned nothing :&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;print(os.popen("ls -l /usr/local/").read())
print(os.popen("ls -l /opt/").read())
print(os.popen("nvcc --version").read())
print(os.popen("which nvcc").read())
&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;[86bb6k8gpl] ls: cannot access '/usr/local/cuda': No such file or directory&lt;BR /&gt;[86bb6k8gpl] /bin/sh: 1: nvcc: not found&lt;/P&gt;&lt;P&gt;I'm pretty new to databricks so I may be missing something obvious, maybe it is installed to a custom location but hard to find it print by print.&lt;BR /&gt;Any help would be appreciated &lt;span class="lia-unicode-emoji" title=":grinning_face_with_sweat:"&gt;😅&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 14 Feb 2024 12:58:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/serving-gpu-endpoint-can-t-find-cuda/m-p/60203#M6565</guid>
      <dc:creator>kfab</dc:creator>
      <dc:date>2024-02-14T12:58:09Z</dc:date>
    </item>
    <item>
      <title>Re: Serving GPU Endpoint, can't find CUDA</title>
      <link>https://community.databricks.com/t5/get-started-discussions/serving-gpu-endpoint-can-t-find-cuda/m-p/61207#M6567</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt; ,&lt;/P&gt;&lt;P&gt;thanks for your reply !&lt;/P&gt;&lt;P&gt;I managed to install Cuda via conda&amp;nbsp;&lt;span class="lia-unicode-emoji" title=":thumbs_up:"&gt;👍&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Also I was wondering, is there any way to ssh to the serving endpoint?&lt;/P&gt;</description>
      <pubDate>Tue, 20 Feb 2024 08:41:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/serving-gpu-endpoint-can-t-find-cuda/m-p/61207#M6567</guid>
      <dc:creator>kfab</dc:creator>
      <dc:date>2024-02-20T08:41:58Z</dc:date>
    </item>
  </channel>
</rss>

