cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Importing sentence-transformers no longer works on Databricks runtime 17.2 ML

excavator-matt
New Contributor III

In Databricks Runtime 16.4 LTS for Machine Learning, I am used to be able to import sentence-transformers without any installation as it is part of the runtime with from sentence_transformers import SentenceTransformer.

In this case I am running on a personal compute cluster and although it generates some advanced warnings that I don't know how to interpret, it runs

2025-09-03 09:39:48.853218: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. 2025-09-03 09:39:48.869748: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2025-09-03 09:39:48.890311: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2025-09-03 09:39:48.896580: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2025-09-03 09:39:48.911486: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2025-09-03 09:39:50.080574: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

However, if I upgrade my cluster to run the latest 17.2 machine learning (Beta). I instead get this crash.

Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
/databricks/python/lib/python3.12/site-packages/flash_attn_2_cuda.cpython-312-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationESs
[Trace ID: 00-91c7cd652996226fe8747ad97efc53e6-f7a274cbb6f58e49-00]

Both runtimes include sentence-transformers although version 3.4.1 versus 4.0.1, so it should work. Is this no longer supported? Is it known to be broken in the beta?

7 REPLIES 7

Khaja_Zaffer
Contributor III

Hello @excavator-matt 

Which version of pytorch and CUDA are you using? Which version of flash attention are you using?

if thereโ€™s a matching wheel for your environment, you can download it from https://github.com/Dao-AILab/flash-attention/releases/ 

I simply run import in in the runtime, so it would it be the runtime default (CUDA 12.6, torch 2.70 and flash_attn 2.7.4.post1). I haven't tried overriding any of these. Not sure if there is a known incompatibility here.

I am trying to find solution for your issue but since morning, I am unable to use databricks community editions, the cluster just spins without error. 

Khaja_Zaffer_0-1757243856025.pngKhaja_Zaffer_1-1757243873902.png

So after running upgrade, I was able to use 

model="Qwen/Qwen2.5-1.5B-Instruct"

Khaja_Zaffer
Contributor III

I just checked 

run pip install --upgrade transformers 

I believe you are working on RAG? 

excavator-matt
New Contributor III

I am not sure if it is possible to replace sentence-transformers with transformers, but I think it should be possible to import sentence-transformers it is an official part of the runtime. In this case, I was more interested in the resulting vectors from sentence-transformers encode than any reply.

Hello @excavator-matt 

Thanks for your response. 

I dont have a workspace to test on 17.2 BETA ML cluster. Highly recommended to use LTS based cluster. 

however: You can try running upgrade transformers and run sentence_transformers. 

After the pip install --upgrade transformers, run from sentence_transformers import SentenceTransformer again in your 17.2 ML Beta cluster. If it still crashes with the undefined symbol error, the upgrade alone might not have fixed the flash-attn compatibility issue (as transformers 4.41.2 might still rely on the runtime's pre-built flash-attn 2.7.4.post1).

Thank you.