Provisioned Throughput is not supported for llama...

c4ndy
New Contributor

Hi,
This question is tightly correlated with another discussion: Model deprecation issue while serving on Databrick... - Databricks Community - 131968

In a nutshell, I'm trying to serve the model which is based on llama architecture (deployed through mlflow tranformer-model logging), but it's not llama directly (at least not with the same params number).

The following error is thrown while trying to make provisioned endpoint:
โ€œProvisioned Throughput is not supported for llama with 7b parameters and 32768 context length- please reach out to support@databricks.com !โ€

Is there a way to mitigate this error?
Seems like it's a metadata misassumption.

model ref: speakleash/Bielik-4.5B-v3.0-Instruct ยท Hugging Face

Thanks in advance.