Databricks Community

m_koch_unify · ‎07-20-2023

Hi all,

I was following the hugging face model https://huggingface.co/TheBloke/Llama-2-70B-chat-GPTQ, which points to use Exllama (https://github.com/turboderp/exllama/), which has 4 bit quantization.

Running on a A10-Single-GPU-64GB,

I've cloned the Exllama repo and have the model files in dbfs. When running

"python test_benchmark_inference.py -d <path_to_model_files> -p -ppl"

I get an error. It seems like the Python script is trying to compile some CUDA code but it's failing to find the cusparse.h file which is part of the CUDA library.

Any help is appreciated, thanks in advance.

Here is full trace:

Traceback (most recent call last): File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1893, in run_ninja_build subprocess.run( File "/usr/lib/python3.10/subprocess.py", line 524, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/Workspace/Repos/michael.koch@unifyconsulting.com/llm-code-converter/exllama/test_benchmark_inference.py", line 1, in <module> from model import ExLlama, ExLlamaCache, ExLlamaConfig File "/Workspace/Repos/michael.koch@unifyconsulting.com/llm-code-converter/exllama/model.py", line 12, in <module> import cuda_ext File "/Workspace/Repos/michael.koch@unifyconsulting.com/llm-code-converter/exllama/cuda_ext.py", line 43, in <module> exllama_ext = load( File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1284, in load return jit_compile( File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1509, in jit_compile write_ninja_file_and_build_library( File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1624, in write_ninja_file_and_build_library run_ninja_build( File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1909, in run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension 'exllama_ext': [1/6] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=exllama_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"gcc\" -DPYBIND11_STDLIB=\"libstdcpp\" -DPYBIND11_BUILD_ABI=\"cxxabi1011\" -I/Workspace/Repos/michael.koch@unifyconsulting.com/llm-code-converter/exllama/exllama_ext -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include/TH -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_BFLOAT16_CONVERSIONS -D__CUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -lineinfo -std=c++17 -c '/Workspace/Repos/michael.koch@unifyconsulting.com/llm-code-converter/exllama/exllama_ext/cuda_func/q4_mlp.cu' -o q4_mlp.cuda.o FAILED: q4_mlp.cuda.o /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=exllama_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"gcc\" -DPYBIND11_STDLIB=\"libstdcpp\" -DPYBIND11_BUILD_ABI=\"cxxabi1011\" -I/Workspace/Repos/michael.koch@unifyconsulting.com/llm-code-converter/exllama/exllama_ext -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include/TH -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_BFLOAT16_CONVERSIONS_ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -lineinfo -std=c++17 -c '/Workspace/Repos/michael.koch@unifyconsulting.com/llm-code-converter/exllama/exllama_ext/cuda_func/q4_mlp.cu' -o q4_mlp.cuda.o In file included from /Workspace/Repos/michael.koch@unifyconsulting.com/llm-code-converter/exllama/exllama_ext/cuda_func/q4_mlp.cuh:7, from /Workspace/Repos/michael.koch@unifyconsulting.com/llm-code-converter/exllama/exllama_ext/cuda_func/q4_mlp.cu:1: /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include/ATen/cuda/CUDAContext.h:6:10: fatal error: cusparse.h: No such file or directory 6 | #include <cusparse.h> | ^~~~~~~~~~~~ compilation terminated. [2/6] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=exllama_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"gcc\" -DPYBIND11_STDLIB=\"libstdcpp\" -DPYBIND11_BUILD_ABI=\"cxxabi1011\" -I/Workspace/Repos/michael.koch@unifyconsulting.com/llm-code-converter/exllama/exllama_ext -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include/TH -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -lineinfo -std=c++17 -c '/Workspace/Repos/michael.koch@unifyconsulting.com/llm-code-converter/exllama/exllama_ext/cuda_func/q4_matmul.cu' -o q4_matmul.cuda.o FAILED: q4_matmul.cuda.o /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=exllama_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"gcc\" -DPYBIND11_STDLIB=\"libstdcpp\" -DPYBIND11_BUILD_ABI=\"cxxabi1011\" -I/Workspace/Repos/michael.koch@unifyconsulting.com/llm-code-converter/exllama/exllama_ext -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include/TH -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -lineinfo -std=c++17 -c '/Workspace/Repos/michael.koch@unifyconsulting.com/llm-code-converter/exllama/exllama_ext/cuda_func/q4_matmul.cu' -o q4_matmul.cuda.o In file included from /Workspace/Repos/michael.koch@unifyconsulting.com/llm-code-converter/exllama/exllama_ext/cuda_func/q4_matmul.cuh:8, from /Workspace/Repos/michael.koch@unifyconsulting.com/llm-code-converter/exllama/exllama_ext/cuda_func/q4_matmul.cu:1: /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include/ATen/cuda/CUDAContext.h:6:10: fatal error: cusparse.h: No such file or directory 6 | #include <cusparse.h> | ^~~~~~~~~~~~ compilation terminated. [3/6] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=exllama_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"gcc\" -DPYBIND11_STDLIB=\"libstdcpp\" -DPYBIND11_BUILD_ABI=\"cxxabi1011\" -I/Workspace/Repos/michael.koch@unifyconsulting.com/llm-code-converter/exllama/exllama_ext -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include/TH -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -lineinfo -std=c++17 -c '/Workspace/Repos/michael.koch@unifyconsulting.com/llm-code-converter/exllama/exllama_ext/cuda_func/half_matmul.cu' -o half_matmul.cuda.o FAILED: half_matmul.cuda.o /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=exllama_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"gcc\" -DPYBIND11_STDLIB=\"libstdcpp\" -DPYBIND11_BUILD_ABI=\"cxxabi1011\" -I/Workspace/Repos/michael.koch@unifyconsulting.com/llm-code-converter/exllama/exllama_ext -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include/TH -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -lineinfo -std=c++17 -c '/Workspace/Repos/michael.koch@unifyconsulting.com/llm-code-converter/exllama/exllama_ext/cuda_func/half_matmul.cu' -o half_matmul.cuda.o In file included from /Workspace/Repos/michael.koch@unifyconsulting.com/llm-code-converter/exllama/exllama_ext/cuda_func/half_matmul.cuh:7, from /Workspace/Repos/michael.koch@unifyconsulting.com/llm-code-converter/exllama/exllama_ext/cuda_func/half_matmul.cu:1: /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include/ATen/cuda/CUDAContext.h:6:10: fatal error: cusparse.h: No such file or directory 6 | #include <cusparse.h> | ^~~~~~~~~~~~ compilation terminated. [4/6] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=exllama_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"gcc\" -DPYBIND11_STDLIB=\"libstdcpp\" -DPYBIND11_BUILD_ABI=\"cxxabi1011\" -I/Workspace/Repos/michael.koch@unifyconsulting.com/llm-code-converter/exllama/exllama_ext -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include/TH -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -lineinfo -std=c++17 -c '/Workspace/Repos/michael.koch@unifyconsulting.com/llm-code-converter/exllama/exllama_ext/cuda_func/q4_attn.cu' -o q4_attn.cuda.o FAILED: q4_attn.cuda.o /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=exllama_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"gcc\" -DPYBIND11_STDLIB=\"libstdcpp\" -DPYBIND11_BUILD_ABI=\"cxxabi1011\" -I/Workspace/Repos/michael.koch@unifyconsulting.com/llm-code-converter/exllama/exllama_ext -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include/TH -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -lineinfo -std=c++17 -c '/Workspace/Repos/michael.koch@unifyconsulting.com/llm-code-converter/exllama/exllama_ext/cuda_func/q4_attn.cu' -o q4_attn.cuda.o In file included from /Workspace/Repos/michael.koch@unifyconsulting.com/llm-code-converter/exllama/exllama_ext/cuda_func/q4_mlp.cuh:7, from /Workspace/Repos/michael.koch@unifyconsulting.com/llm-code-converter/exllama/exllama_ext/cuda_func/q4_attn.cu:1: /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include/ATen/cuda/CUDAContext.h:6:10: fatal error: cusparse.h: No such file or directory 6 | #include <cusparse.h> | ^~~~~~~~~~~~ compilation terminated. [5/6] c++ -MMD -MF exllama_ext.o.d -DTORCH_EXTENSION_NAME=exllama_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/Workspace/Repos/michael.koch@unifyconsulting.com/llm-code-converter/exllama/exllama_ext -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include/TH -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -c '/Workspace/Repos/michael.koch@unifyconsulting.com/llm-code-converter/exllama/exllama_ext/exllama_ext.cpp' -o exllama_ext.o FAILED: exllama_ext.o c++ -MMD -MF exllama_ext.o.d -DTORCH_EXTENSION_NAME=exllama_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/Workspace/Repos/michael.koch@unifyconsulting.com/llm-code-converter/exllama/exllama_ext -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include/TH -isystem /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -c '/Workspace/Repos/michael.koch@unifyconsulting.com/llm-code-converter/exllama/exllama_ext/exllama_ext.cpp' -o exllama_ext.o In file included from /Workspace/Repos/michael.koch@unifyconsulting.com/llm-code-converter/exllama/exllama_ext/exllama_ext.cpp:3: /local_disk0/.ephemeral_nfs/envs/pythonEnv-a6adfedc-4be5-462e-a391-30ebe4274fda/lib/python3.10/site-packages/torch/include/ATen/cuda/CUDAContext.h:6:10: fatal error: cusparse.h: No such file or directory 6 | #include <cusparse.h> | ^~~~~~~~~~~~ compilation terminated. ninja: build stopped: subcommand failed.

Kumaran · ‎07-21-2023

Hi @m_koch_unify,

Thank you for posting the question in the Databricks community.

This issue may occur due to the package 'cusparse' that is not found. As Databricks removed several uncommon packages since MLR >= 8.1 and cusparse is one of them. If you need cusparse library, you can manually install it via the init script below:

#!/bin/bash

set -e

# The root directory which has the necessary code to initialize the cluster during cluster init.  This should ideally
# be a user directory that was synced over according to the README.md.
DS_PROJECTS_ROOT=${DS_PROJECTS_ROOT:-/dbfs/data-mle/llm/ds-projects}

# This points to the requirements that should be installed during cluster init.  The requirements we install may change
# depending on the notebook.
export REQUIREMENTS_NAME=${REQUIREMENTS_NAME:-deepspeed.in}
export REQUIREMENTS_PATH=${REQUIREMENTS_PATH:-$DS_PROJECTS_ROOT/app/llm/requirements/$REQUIREMENTS_NAME}

# This points to the path where the Python environment should be cached in DBFS after pip install complets.  The
# default path here should generally be fine.
export PYTHON_ENV_CACHE=${PYTHON_ENV_CACHE:-$DS_PROJECTS_ROOT/python-env-cache-$DATABRICKS_RUNTIME_VERSION}

echo Root dir: $DS_PROJECTS_ROOT
echo Requirements path: $REQUIREMENTS_PATH
echo Python env cache: $PYTHON_ENV_CACHE

if [ -f "/usr/local/cuda/bin/nvcc" ]; then

    # Needed by deepspeed for distributed training.
    apt-get install -y pdsh expect

    # When developing CUDA applications, it is often necessary to include the CUDA headers in your source code.
    # The CUDA headers are a collection of C/C++ header files that define the interfaces to the CUDA libraries and runtime.
    # The header files provide declarations for functions, constants, and data types that are used in CUDA programming.
    # By including these headers in your source code, you can access the CUDA API and use the functions and data types
    # defined in the headers.

    # Databricks Runtime 13.x uses CUDA 11.7
    if /usr/local/cuda/bin/nvcc -V | grep -q cuda_11.7; then
        echo Found CUDA 11.7
        # TODO should these be dev version as with 11.3?
        export libcusparse=libcusparse-11-7_11.7.4.91-1_amd64.deb
        export libcublas=libcublas-11-7_11.10.3.66-1_amd64.deb
        export libcusolver=libcusolver-11-7_11.4.0.1-1_amd64.deb
        export libcurand=libcurand-11-7_10.2.10.91-1_amd64.deb

    # Databricks Runtime 12.x uses CUDA 11.3
    elif /usr/local/cuda/bin/nvcc -V | grep -q cuda_11.3; then
        echo Found CUDA 11.3
        export libcusparse=libcusparse-dev-11-3_11.5.0.58-1_amd64.deb
        export libcublas=libcublas-dev-11-3_11.5.1.109-1_amd64.deb
        export libcusolver=libcusolver-dev-11-3_11.1.2.109-1_amd64.deb
        export libcurand=libcurand-dev-11-3_10.2.4.109-1_amd64.deb

    else
        /usr/local/cuda/bin/nvcc -V
        echo Unsupported cuda version
        exit 1
    fi

    # includes the headers and static libraries for the cuSPARSE library, used for sparse matrix operations.
    wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/$libcusparse \
        -O /tmp/libcusparse.deb && \
        dpkg -i /tmp/libcusparse.deb

    # includes the headers and static libraries for the cuBLAS library, used for linear algebra operations.
    wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/$libcublas \
        -O /tmp/libcublas.deb && \
        dpkg -i /tmp/libcublas.deb

    # includes the headers and static libraries for the cuSOLVER library, used for solving dense and sparse linear systems.
    wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/$libcusolver \
        -O /tmp/libcusolver.deb && \
        dpkg -i /tmp/libcusolver.deb

    # includes the headers and static libraries for the cuRAND library, used for random number generation.
    wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/$libcurand \
        -O /tmp/libcurand.deb && \
        dpkg -i /tmp/libcurand.deb

    # Create an environment variables file that deepspeed will apply on the worker nodes.
    # The PATH will not include the /databricks/python3/bin directory, which is where we have all the Python
    # binaries we use will be installed.  For example, without updating the path "ninja" won't be found and
    # pytorch will have a compilation failure.
    echo "PATH=/databricks/python3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" >> /root/.deepspeed_env
fi

# Save the cluster ID and workspace ID to a file which we can conveniently use later when starting the service to
# set environment variables it requires.
echo "export DB_CLUSTER_ID=\"$DB_CLUSTER_ID\"" >> /root/.envinfo
echo "export DB_CLUSTER_NAME=\"$DB_CLUSTER_NAME\"" >> /root/.envinfo
echo "export WORKSPACE_ID=2548836972759138" >> /root/.envinfo
echo "export REQUIREMENTS_NAME=\"$REQUIREMENTS_NAME\"" >> /root/.envinfo
echo "export REQUIREMENTS_PATH=\"$REQUIREMENTS_PATH\"" >> /root/.envinfo
echo "export PYTHON_ENV_CACHE=\"$PYTHON_ENV_CACHE\"" >> /root/.envinfo
echo "export DS_PROJECTS_ROOT=\"$DS_PROJECTS_ROOT\"" >> /root/.envinfo

# This is similar to the init scripts under deploy/resources except it uses paths under dbfs/data-mle/llm,
# or a root configurd with DS_PROJECTS_ROOT.
# This enables us to ship more up-to-date requirements and cache independently from other code in logfood.

/databricks/python/bin/python $DS_PROJECTS_ROOT/deploy/resources/pip_installs.py

if [ -d "/databricks/python3-restored" ]; then
   rm -rf  /databricks/python3
   echo "Replacing /databricks/python3"
   mv /databricks/python3-restored /databricks/python3
fi

m_koch_unify · ‎07-21-2023

Hi @Kumaran,

Thanks so much for the quick reply. When I run the script with

!bash install_cusparse.sh

It runs for a bit, but ultimately encounters an error. When I run !ls -l, i dont even see a data-mle directory in dbfs

here is the full output from running script.

Root dir: /dbfs/data-mle/llm/ds-projects
Requirements path: /dbfs/data-mle/llm/ds-projects/app/llm/requirements/deepspeed.in
Python env cache: /dbfs/data-mle/llm/ds-projects/python-env-cache-13.1
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
pdsh is already the newest version (2.31-3build2).
The following NEW packages will be installed:
  expect tcl-expect
0 upgraded, 2 newly installed, 0 to remove and 4 not upgraded.
Need to get 242 kB of archives.
After this operation, 549 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/universe amd64 tcl-expect amd64 5.45.4-2build1 [105 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy/universe amd64 expect amd64 5.45.4-2build1 [137 kB]
Fetched 242 kB in 1s (266 kB/s)
debconf: delaying package configuration, since apt-utils is not installed
Selecting previously unselected package tcl-expect:amd64.
(Reading database ... 101052 files and directories currently installed.)
Preparing to unpack .../tcl-expect_5.45.4-2build1_amd64.deb ...
Unpacking tcl-expect:amd64 (5.45.4-2build1) ...
Selecting previously unselected package expect.
Preparing to unpack .../expect_5.45.4-2build1_amd64.deb ...
Unpacking expect (5.45.4-2build1) ...
Setting up tcl-expect:amd64 (5.45.4-2build1) ...
Setting up expect (5.45.4-2build1) ...
Processing triggers for man-db (2.10.2-1) ...
Processing triggers for libc-bin (2.35-0ubuntu3.1) ...
Found CUDA 11.7
--2023-07-21 23:31:49--  https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcusparse-11-7_11.7.4.91-1_amd64.deb
Resolving developer.download.nvidia.com (developer.download.nvidia.com)... 152.195.19.142
Connecting to developer.download.nvidia.com (developer.download.nvidia.com)|152.195.19.142|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 102711722 (98M) [application/x-deb]
Saving to: ‘/tmp/libcusparse.deb’

/tmp/libcusparse.de 100%[===================>]  97.95M  97.9MB/s    in 1.0s    

2023-07-21 23:31:50 (97.9 MB/s) - ‘/tmp/libcusparse.deb’ saved [102711722/102711722]

(Reading database ... 101132 files and directories currently installed.)
Preparing to unpack /tmp/libcusparse.deb ...
Unpacking libcusparse-11-7 (11.7.4.91-1) over (11.7.3.50-1) ...
Setting up libcusparse-11-7 (11.7.4.91-1) ...
--2023-07-21 23:31:58--  https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcublas-11-7_11.10.3.66-1_amd64.deb
Resolving developer.download.nvidia.com (developer.download.nvidia.com)... 152.195.19.142
Connecting to developer.download.nvidia.com (developer.download.nvidia.com)|152.195.19.142|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 209757218 (200M) [application/x-deb]
Saving to: ‘/tmp/libcublas.deb’

/tmp/libcublas.deb  100%[===================>] 200.04M   125MB/s    in 1.6s    

2023-07-21 23:32:00 (125 MB/s) - ‘/tmp/libcublas.deb’ saved [209757218/209757218]

(Reading database ... 101132 files and directories currently installed.)
Preparing to unpack /tmp/libcublas.deb ...
Unpacking libcublas-11-7 (11.10.3.66-1) over (11.10.1.25-1) ...
Setting up libcublas-11-7 (11.10.3.66-1) ...
--2023-07-21 23:32:17--  https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcusolver-11-7_11.4.0.1-1_amd64.deb
Resolving developer.download.nvidia.com (developer.download.nvidia.com)... 152.195.19.142
Connecting to developer.download.nvidia.com (developer.download.nvidia.com)|152.195.19.142|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 45644430 (44M) [application/x-deb]
Saving to: ‘/tmp/libcusolver.deb’

/tmp/libcusolver.de 100%[===================>]  43.53M  99.8MB/s    in 0.4s    

2023-07-21 23:32:17 (99.8 MB/s) - ‘/tmp/libcusolver.deb’ saved [45644430/45644430]

(Reading database ... 101132 files and directories currently installed.)
Preparing to unpack /tmp/libcusolver.deb ...
Unpacking libcusolver-11-7 (11.4.0.1-1) over (11.4.0.1-1) ...
Setting up libcusolver-11-7 (11.4.0.1-1) ...
--2023-07-21 23:32:26--  https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcurand-11-7_10.2.10.91-1_amd64.deb
Resolving developer.download.nvidia.com (developer.download.nvidia.com)... 152.195.19.142
Connecting to developer.download.nvidia.com (developer.download.nvidia.com)|152.195.19.142|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 41549148 (40M) [application/x-deb]
Saving to: ‘/tmp/libcurand.deb’

/tmp/libcurand.deb  100%[===================>]  39.62M   111MB/s    in 0.4s    

2023-07-21 23:32:26 (111 MB/s) - ‘/tmp/libcurand.deb’ saved [41549148/41549148]

(Reading database ... 101132 files and directories currently installed.)
Preparing to unpack /tmp/libcurand.deb ...
Unpacking libcurand-11-7 (10.2.10.91-1) over (10.2.10.91-1) ...
Setting up libcurand-11-7 (10.2.10.91-1) ...
/databricks/python/bin/python: can't open file '/dbfs/data-mle/llm/ds-projects/deploy/resources/pip_installs.py': [Errno 2] No such file or directory

Databricks Community

Running test inference on Llama-2-70B-chat-GPTQ… are C++ libraries installed correctly?

Join Us as a Local Community Builder!

Solution Accelerator Series | #4 - Toxicity Detection for Gaming

Databricks Specialist Sessions

🚀 Weekly Delta (24-30 September): A Look Back at This Week’s Top Community Highlights!

Announcing Data Intelligence for Cybersecurity

🌟 Community Sparks of the Week | September 19 – 25 🌟