We have build a chat solution on LLM RAG chat model, but we face an issue when we spin up a service endpoint to host the model.
According to the documentation, there should be sevral LLM models available as pay-per-token endpoints, for instance the DBRX Instruct.
https://learn.microsoft.com/en-us/azure/databricks/machine-learning/foundation-models/supported-mode...
However, in our workspace we only se two available pay-per-token endpoints (se attachment "serving endpoints.png").
When we "create a new service endpoint", it seems like we can only spin up "provisioned throughtput models, which are currently too expensive to run for our setup (se attachment "issue.png").
Our Databricks environment is in azure west europe.
Any suggestions?