Databrciks app 504 Upstream request timeout

Skcmsa007
New Contributor

I have deployed my fast api application in databricks apps and I have given keep alive timeout 1200.

Issue:

From databricks swagger I am getting 504 "upstream request timeout" after 2 mins while my api takes 3 min to respond. 

But in backend my task got complete. So it's clearly a databricks gateway timeout.

So I could not find any option in databricks to increase the timeout. If databricks apps doesnot have as such option then need to think twice as we have used it in a client production based on databricks suggestion. 

 

Any quick solution?

Lu_Wang_ENB_DBX
Databricks Employee
Databricks Employee

TLDR: You cannot increase the upstream gateway timeout in Databricks Apps. The best practice and quick solution to handle operations that take longer than the gateway limit is to implement a "status pull" (polling) pattern.
Why the Timeout Occurs Databricks Apps enforce strict ingress gateway timeouts to maintain platform stability. Increasing the keep-alive timeout in your FastAPI configuration only applies to the local container, not the Databricks ingress proxy that sits in front of it. When your request reaches the platform's hard limit (around 2 minutes), the gateway drops the connection and returns a 504 error, even if your backend task continues to run and eventually completes.
Recommended Solution: "Status Pull" Pattern To resolve this for production applications, Databricks best practices dictate that you should prefer a "status pull" over long-running synchronous connections.
You can quickly architect this by doing the following:
Trigger and Return: Modify your initial FastAPI endpoint to kick off your 3-minute task in the background and immediately return an HTTP response (such as a 202 Accepted) containing a unique tracking identifier (e.g., task_id).
Poll for Status: Create a secondary endpoint (e.g., /status/{task_id}) that checks the state of the background task and returns whether it is pending, processing, or complete.
Client-Side Updates: Configure your frontend to periodically ping the status endpoint (for example, once every 5 seconds) until the operation finishes and the final payload is retrieved.
This approach completely avoids the gateway timeout, frees up server resources, and aligns with the recommended runtime performance architecture for Databricks Apps.

Please accept the solution if the recommendation worked for you.