cancel
Showing results for 
Search instead for 
Did you mean: 
Generative AI
Explore discussions on generative artificial intelligence techniques and applications within the Databricks Community. Share ideas, challenges, and breakthroughs in this cutting-edge field.
cancel
Showing results for 
Search instead for 
Did you mean: 

Model serving endpoint API error - Mosaic AI Agents API

JingXie
New Contributor

Hi, 

I have built a Chatbot App using Streamlit (through Databricks Apps UI). The chatbot backend is a custom RAG model which is built by following this example notebook02-Deploy-RAG-Chatbot-Model.

The needed databricks packages are installed: 

databricks_vectorsearch-0.56
mlflow_2.20.2
databricks_langchain_0.5.0
databricks_agents-0.20.0
databricks_sdk-0.50.0
 
The RAG chain includes: databricks-gte-large-en model as the embedding model and databricks-meta-llama-3-1-70b-instruct model as the llm generator.
I registered the RAG chain to UC and deployed this RAG using Model Serving. When the RAG model is deployed and ready, the Chatbot UI can query this RAG model endpoint. Since the Chatbot App was created by Databricks App, it by default uses Service Principal to access the RAG model serving endpoint. I also implemented a Token Manager class to refresh the access token generation before the token expires. 
 
However, every time, after the RAG model endpoint was invoked (either restart, or scale from zero), it only worked for one hour. After one hour, my chatbot UI cannot query this RAG model endpoint and received this error message: 


HTTP error occurred: 400 Client Error: Bad Request for url: https://dbc-xxxxx.cloud.databricks.com/serving-endpoints/chatbot_poc/invocations

🔢Status code: N/A 📄 Response text: No response text 📦 Parsed JSON (if available): {'error_code': 'BAD_REQUEST', 'message': '1 tasks failed. Errors: {0: 'error: Exception("Response content b\'Invalid Token\', status_code 400") Traceback (most recent call last):\n File "/opt/conda/envs/mlflow-env/lib/python3.12/site-packages/databricks/vector_search/utils.py", line 128, in issue_request\n response.raise_for_status()\n File "/opt/conda/envs/mlflow-env/lib/python3.12/site-packages/requests/models.py", line 1024, in raise_for_status\n raise HTTPError(http_error_msg, response=self)\nrequests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://dbcxxx.cloud.databricks.com/api/2.0/serving-endpoints/databricks-gte-large-en\\n\\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File "/opt/conda/envs/mlflow-env/lib/python3.12/site-packages/mlflow/langchain/api_request_parallel_processor.py",

 

If I click the bad request for url, then I got the error message:
{"error_code":401,"message":"Credential was not sent or was of an unsupported type for this API. [ReqId: xxxxx-xxxx]"}
 
I have tried all the information I could find but still cannot diagnose the root cause. It seems that some mlflow env configuration related to databricks-gte-large-en model serving endpoint. But it is not managed by myself and I could not find the source code of this API. I also tried to use PAT to query the RAG endpoint, still faced the same error. 
 
Could you help me diagnose this error? Since I need to deploy this Chatbot App to our internal team, the RAG model serving endpoint fails after one hour and has to be restart manually is definitely not a solution we can use.
 
Thanks!
 
Regards,
Jing Xie 
0 REPLIES 0

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now