Hi @JingXie,
This is a classic authentication token issue, but it's happening in a different place than you might think. The Invalid Token error is not coming from your Streamlit App's call to your chatbot_poc endpoint.
It's coming from inside your chatbot_poc endpoint when it tries to call the databricks-gte-large-en embedding model endpoint.
Here is the chain of events and where the failure occurs:
-
Streamlit App to RAG Endpoint: Your Streamlit App (running as a Databricks App) uses its Service Principal (SP) and your Token Manager to get a token. It successfully sends a request to .../chatbot_poc/invocations. This part works.
-
RAG Endpoint to Embedding Endpoint :
-
Your chatbot_poc model (the pyfunc RAG chain) receives the request.
-
To process the query, it must first embed the user's text. To do this, it makes its own API call to the databricks-gte-large-en foundation model endpoint.
-
This is where it fails. The error 400 Client Error: Bad Request for url: https://dbcxxx.cloud.databricks.com/api/2.0/serving-endpoints/databricks-gte-large-en and the content b'Invalid Token' prove this.
The root cause is that the identity (token) your chatbot_poc endpoint is using to call the embedding endpoint expires after one hour. This identity was likely "baked in" when you logged the model to MLflow. You probably initialized the DatabricksEmbeddings client in your notebook, which captured your notebook's temporary 1-hour token. When the model serving endpoint starts, it uses this stale, captured token, which works for an hour and then dies.
In cell 8 of notebook 02-Deploy-RAG-Chatbot-Model that it warns of using notebook auth (see attached screenshot).
Your Streamlit app's Token Manager is irrelevant here; it only manages the token for step 1. You need to fix the token for step 2.
Recommended Solution: Use a Service Principal for Model Serving
The most robust and standard solution is to configure your chatbot_poc Model Serving endpoint to "run as" a Service Principal.
This gives the endpoint its own identity. When it needs to call other Databricks services (like the embedding endpoint or the Llama 3 endpoint), it will use this SP identity to automatically generate a fresh, short-lived token for every single call. This completely eliminates the 1-hour expiry problem.
Step-by-Step Fix
Here’s how to implement this fix:
1. Ensure You Have a Service Principal (SP)
You probably already have one for your Databricks App. You can use the same one. If not, create a new one and record its Application ID.
2. Grant the SP "Can Query" Permissions
This is the most critical step. Your Service Principal needs permission to call the other model endpoints.
-
Go to Machine Learning > Serving.
-
Find the databricks-gte-large-en endpoint.
-
Click it, then go to the Permissions tab.
-
Add your Service Principal and give it the Can Query permission.
-
Repeat this process for your LLM endpoint: databricks-meta-llama-3-1-70b-instruct.
-
Repeat this process for your Vector Search endpoint (if it's a separate serving endpoint).
3. Re-log Your RAG Chain (A Quick Check)
This is a sanity check to ensure you aren't hardcoding any tokens. Your code for initializing the LangChain components inside your model-logging notebook should look like this:
If your code was passing a token (e.g., from dbutils), remove it and re-log your model to UC.
4. Re-configure Your chatbot_poc Endpoint
This is the final step to tie it all together.
-
Go to Machine Learning > Serving.
-
Find and click on your chatbot_poc endpoint.
-
Click the Edit button (or "Edit configuration").
-
Look for the "Run as" setting. By default, it's probably set to your user account ([user]@domain.com).
-
Change this setting from your user to the Service Principal you configured in Step 2.
-
Save the endpoint configuration.
The endpoint will restart. Once it's ready, it will now be running as the Service Principal. When your Streamlit app calls it, your RAG chain's internal calls to the embedding and LLM endpoints will use the SP's identity to generate new tokens on the fly, and the 1-hour expiry problem will be gone.
Please let me know if the above does not solve your issue, and share any additional information you are seeing to help me diagnose what else might be going wrong.
Thank you!