cancel
Showing results for 
Search instead for 
Did you mean: 
Generative AI
Explore discussions on generative artificial intelligence techniques and applications within the Databricks Community. Share ideas, challenges, and breakthroughs in this cutting-edge field.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to Increase HTTP Request Timeout for Databricks App Beyond 120 Seconds?

snarayan
New Contributor II

I’ve built a Databricks App using Gradio that leverages predict_stream to get streaming responses from a multi-agent supervisor. The app coordinates reasoning across four knowledge agents, so the model uses a long chain-of-thought process before returning a final answer.

Issue
Whenever the streaming response exceeds 120 seconds, the entire stream freezes. At that point, the logs also stop updating, which suggests the HTTP request is timing out. This is problematic because the reasoning process for complex queries often takes longer than two minutes. The rest of the app seem to work fine.

I’ve checked the app configuration but haven’t found any setting for request_timeout or similar. I’m not sure if this is something that needs to be configured in Model Serving, App settings, or elsewhere in Databricks.

What I’ve Tried

  • Verified the Gradio setup and streaming logic.
  • Looked through Databricks documentation for request timeout settings but couldn’t find anything specific for Apps or Model Serving.
  • Confirmed that the issue consistently occurs at exactly 120 seconds, which feels like a hard limit.

Question

  • Is there a way to increase the HTTP request timeout for Databricks Apps beyond 120 seconds?
  • Alternatively, is there a configuration to allow longer streaming responses from predict_stream in Model Serving?
  • Where should I set this—App settings, Model Serving endpoint, or somewhere else?

Any guidance or workaround would be greatly appreciated!

2 REPLIES 2

szymon_dybczak
Esteemed Contributor III

Hi @snarayan ,

I think you might be hitting timeout from model serving endpoint:

Debug model serving timeouts - Azure Databricks | Microsoft Learn

You can try to increase timeout using environment variables using the Serving UI or programmatically using Python. 

szymon_dybczak_0-1764660902814.png

 

Thanks for the info!
 
I tried setting the MLFLOW_DEPLOYMENT_PREDICT_TIMEOUT to 300, MLFLOW_DEPLOYMENT_PREDICT_TOTAL_TIMEOUT to 600, and MLFLOW_DEPLOYMENT_PREDICT_TIMEOUT  to 300 in my Gradio app. My Gradio app uses the predict_stream function to call a Multi Agent Supervisor agent built using Agent bricks. But the config for these timeouts is not quite working. I am using os.environ to set up limits, please let me know if there is different way to set this up.
 
I noticed that I do not get any timeout error codes in the Gradio app deployment logs or through the debug print statements to identify the specific timeout error. Overall, the stream consistently freezes after 120 seconds, and the backend app logs also stop updating. No timeout error codes are observed. When I refresh the app, it works normally again.
 
But my understanding is that we can set the time out seconds during model creation or modify it after model creation. 
 
But this model being connected to a foundational-model (Claude Sonnet/ GPT OSS) since its a Multi Agent Supervisor (I learned this from the docs). Even though mlflow configuration docs allow adding env vars for serving endpoints broadly, I couldn’t find a explicit statement that guarantees that MLFLOW_DEPLOYMENT_PREDICT_TIMEOUT, MLFLOW_DEPLOYMENT_PREDICT_TOTAL_TIMEOUT, and MLFLOW_DEPLOYMENT_PREDICT_TIMEOUT is honored for foundation-model endpoints when served via the standard foundation-model serving stack. I assume I don't have access to change the configuration for them since this is managed by Agent bricks.