Hi @pikachu89,
Thanks for the clarification. If the client is already being created during inference, then the usual stale-client explanation does not apply. Even then... this still does not look like an expected 13-day token lifetime. In Model Serving, the SDK resolves auth per request, and the serving runtime rereads the injected token source at a short cadence rather than holding a single token for days.
Given that a stop/start immediately fixes it, this points more to a server-side token refresh or a runtime auth issue than to something obviously wrong in your code. The best path is to raise a support ticket and submit the exact timestamp of the failure, the HTTP status and response body, request IDs, and confirmation that the downstream endpoint permissions for the service account did not change.
It is also worth moving off get_open_ai_client(), since that path is deprecated in favour of the newer Databricks OpenAI client.
If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.
Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***