DBRX - Serving endpoint failed - update timed out.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ06-02-2024 11:29 PM
Hi,
https://notebooks.databricks.com/demos/llm-rag-chatbot/index.html
Following this tutorial I'm trying to serve an endpoint with DBRX model connected to my data in Vector Db.
Without any problem I can log my model in Databricks with MLFlow and call the model locally form notebooks but when I try to serve the endpoint it still fails after about 35-40 minutes with message:
OperationFailed: failed to reach NOT_UPDATING, got EndpointStateConfigUpdate.UPDATE_FAILED: current status: EndpointStateConfigUpdate.UPDATE_FAILED
In the create_and_wait() method I set the timeout parameter for two hours to prevent stopping the method after default 20 minutes like so:
Screenshots from Serving tab in Databricks:
In the service logs I can see also some exceptions rised by conda:
Any idea how to solve the issue?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ06-04-2024 03:59 AM
Thank you for the answer.
- The model is located in Unity Catalog like so:
- The model isn't deployed yet so can't check health metrics.
- I don't use Azure DevOps
- I've implemented 5 retries (first run creates the endpoint, next ones try to update it), but all generate the same error. Each time it seems to fail after similar period of time:
- If I understand correctly, the model serving does not take place on my cluster where I can set environment variables - correct me if I'm wrong please. I can run endpoint creation with cluster off using UI and none of my clusters are running at this time:
So where can I set the variable?
I've set the variable in a code before executing create_and_wait() method but not sure if it's correct. - Any other ideas?
- What about the conda exceptions during deployement - how could I debug it.
As a test I also served simple linear regression model. The endpoint has been created successfully and works fine.

