Thanks for the info!
I tried setting the MLFLOW_DEPLOYMENT_PREDICT_TIMEOUT to 300, MLFLOW_DEPLOYMENT_PREDICT_TOTAL_TIMEOUT to 600, and MLFLOW_DEPLOYMENT_PREDICT_TIMEOUT to 300 in my Gradio app. My Gradio app uses the predict_stream function to call a Multi Agent Supervisor agent built using Agent bricks. But the config for these timeouts is not quite working. I am using os.environ to set up limits, please let me know if there is different way to set this up.
I noticed that I do not get any timeout error codes in the Gradio app deployment logs or through the debug print statements to identify the specific timeout error. Overall, the stream consistently freezes after 120 seconds, and the backend app logs also stop updating. No timeout error codes are observed. When I refresh the app, it works normally again.
But my understanding is that we can set the time out seconds during model creation or modify it after model creation.
But this model being connected to a foundational-model (Claude Sonnet/ GPT OSS) since its a Multi Agent Supervisor (I learned this from the docs). Even though mlflow configuration docs allow adding env vars for serving endpoints broadly, I couldn’t find a explicit statement that guarantees that MLFLOW_DEPLOYMENT_PREDICT_TIMEOUT, MLFLOW_DEPLOYMENT_PREDICT_TOTAL_TIMEOUT, and MLFLOW_DEPLOYMENT_PREDICT_TIMEOUT is honored for foundation-model endpoints when served via the standard foundation-model serving stack. I assume I don't have access to change the configuration for them since this is managed by Agent bricks.