Databricks Community

Shubhankar_123 · 4 weeks ago

We are facing an internal 500 error accessing the vector search endpoint through streamlit application, if I refresh the application sometimes the error goes away, it has now started to become an usual occurrence. If I try to query the endpoint using the console, I am able to fetch data, but through the streamlit app hosted on databricks it throws the internal 500 error

The embedding model is openAI text-embedding-3-large.

We’re running the embedding process as a separate pipeline and storing the output. Then we’re using a truncated and normalized vector (768 dimensions) for the index creation.

The vector search process is then as follows:

Query string -> Send string to create vector embeddings -> truncate/normalise to get query vector -> send vector to similarity_search function

Here’s some code snippets of how it all works (it’s in a streamlit app):

The vector client authentication using service principal client id/secret

Running the similarity search

The embedding is handled through a separate function which is called and stored before sending to the similarity search. It runs on the serving endpoint within Databricks rather than directly to the underlying OpenAI resource. This part appears to be working ok.

And the Workspace client authentication, again using service principal client id/secret.

This is the stack trace of the internal error:

Exception: Response content b'{"error_code":"INTERNAL_ERROR","message":"Something went wrong, please try again later","details":[{"@type":"type.googleapis.com/google.rpc.RequestInfo","request_id":"xxxxxxx","serving_data":""}]}', status_code 500

mark_ott · 3 weeks ago

The intermittent Internal 500 errors you’re experiencing when accessing the vector search endpoint through a Streamlit app on Databricks—while direct console queries work—suggest an issue with the interaction between your Streamlit app’s environment and Databricks’ vector search serving endpoint. Here’s a structured approach to diagnose and address these errors:

Potential Causes

1. Resource Constraints or Rate Limiting

Streamlit apps sometimes share compute resources or network limits on Databricks. If your app is triggering bursty traffic or exceeds API rate limits, backend services may respond with 500 or similar errors.
If the same query works from the console but intermittently fails from Streamlit, the Streamlit runtime might be hitting timeouts or concurrency caps imposed by Databricks.

2. Session or Authentication Expiry

Service principal tokens (client id/secret) issued to a Streamlit app may expire or become invalid during long-running sessions, especially if the app remains open and idle, but works on refresh (a new authentication may occur).
Check token refresh logic and ensure that credentials are not reused beyond their validity.

3. Payload or Request Serialization Issues

There may be subtle differences in how requests are serialized or transmitted from Streamlit compared to console clients (e.g., encoding, headers, vector format).
If your 768-dimension vectors are truncated or normalized inconsistently, malformed payloads might intermittently trigger backend crashes.

4. Concurrency and Race Conditions

Streamlit apps (especially multiuser ones) can generate concurrent queries that the backend may have difficulty handling unless concurrency is explicitly supported.
Log when errors occur—if multiple users or rapid queries trigger failures, rate limiting or concurrency throttling may be at play.

5. Backend Service Instability

The 500 error message is generic (“Something went wrong, please try again later”) and includes a Google RPC request ID, indicating that Databricks is proxying to a Google-style endpoint and the error origin may be upstream.
There may be bugs or resource issues in the Databricks vector search endpoint, especially with load or malformed input.

Diagnostic Steps

Collect Logs: Capture request payloads and response headers for both working and failing queries. Compare them for inconsistencies.
Monitor Streamlit Resource Usage: Monitor memory, CPU, and networking on the Databricks cluster hosting the Streamlit app. Look for spikes or exhaustion coinciding with errors.
Check Token & Session Expiry: Confirm token validity just before request is made during failures.
Reduce Query Frequency: Temporarily add throttling/debounce logic to Streamlit and see if 500s decrease.
Detailed Error Logging: Enable debug mode or detailed error logging on the backend API (if configurable in Databricks), to see backend error traces.
Verify Vector Format: Double-check that the precomputed, truncated, and normalized vectors submitted from Streamlit match those from your successful console calls.

Common Remedies

Add Retry & Fallback Logic: In the Streamlit app code, implement retry logic with exponential backoff for 500 errors (while avoiding overwhelming the backend).
Explicit Re-authentication: Refresh tokens or re-acquire credentials prior to making a query if the session is stale.
Streamlit/Databricks Version: Confirm you are not hitting a known bug in the specific version of Databricks or Streamlit—check release notes.
Adjust App Concurrency: Limit concurrent users or queries from the Streamlit UI to narrow down if this resolves errors.

Example Debug Code Snippet

python

import time
import requests

def safe_vector_search(query_vector, max_retries=3):
    for attempt in range(max_retries):
        result = similarity_search(query_vector) # your wrapped search function
        if result.status_code == 200:
            return result.json()
        elif result.status_code == 500:
            time.sleep(2 ** attempt)  # exponential backoff
        else:
            raise Exception(f"Unhandled error {result.status_code}: {result.content}")
    raise Exception("Vector search failed after retries")

Final Recommendations

Log the exact vector and payload for failed vs successful calls.
Coordinate with Databricks support: Provide error logs and request IDs—they can trace failed requests server-side for internal errors.
Test with simplified/known-good vectors to rule out edge cases in vector size/format.

By focusing on authentication flows, payload consistency, and concurrency/resource issues, you can quickly narrow down the root cause and avoid these intermittent 500 errors.