how to reduce scale to zero time in MLFlow Serving
Hi,I am deploying MLflow models using Databrick serverless serving but seems servers scale down to 0 only after 30 minute of inactivity. Is there any way to reduce this time?Also, Is it possible to deploy multiple models under single endpoint. I want...
- 1434 Views
- 1 replies
- 1 kudos
Latest Reply
Regarding your first question about reducing the scale-down time of Databricks serverless serving, currently, the system is designed to scale down to zero after 30 minutes of inactivity. This is to ensure that instances are kept warm to handle any su...
- 1 kudos