Databricks Community

phi_alpaca · 02-15-2024

Hello,I've been trying to serve registered MLflow models at GPU Model Serving Endpoint, which works except for the models using bitsandbytes library. The library is used to quantise the LLM models into 4-bit/ 8-bit (e.g. Mistral-7B), however, it runs...

phi_alpaca · 03-07-2024

I seem to have some compatibility issues with cudatoolkit=11.8, would it be possible for you share what versions you use for torch, transformers, accelerate, and bitsandbytes? Thanks!

phi_alpaca · 02-27-2024

Thanks so much for sharing and glad it worked out for you guys!I will have a go and feed back.

phi_alpaca · 02-22-2024

Hey @JAgreenskylake , no luck so far. I have been working around it by not using quantised models, which is not ideal, so really hope it's possible to do that soon.

phi_alpaca · 02-20-2024

Hey @G-M , thanks for sharing your experience as well. Unfortunately I haven't had any luck on my end for resolving this. Would be interested to know if you have any breakthrough down the line. Is it something Databricks might be able to put a small ...

Databricks Community

User Stats

User Activity

Error at model serving for quantised models using bitsandbytes library

Re: Error at model serving for quantised models using bitsandbytes library

Re: Error at model serving for quantised models using bitsandbytes library

Re: Error at model serving for quantised models using bitsandbytes library

Re: Error at model serving for quantised models using bitsandbytes library