cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

how to reduce scale to zero time in MLFlow Serving

sanjay
Valued Contributor II

Hi,

I am deploying MLflow models using Databrick serverless serving but seems servers scale down to 0 only after 30 minute of inactivity. Is there any way to reduce this time?

Also, Is it possible to deploy multiple models under single endpoint. I want to run multiple models in one endpoint to reduce cost like AWS sage maker provides multi-model deployment functionality.

Appreciate any help.

Regards,
Sanjay

1 REPLY 1

Walter_C
Databricks Employee
Databricks Employee

Regarding your first question about reducing the scale-down time of Databricks serverless serving, currently, the system is designed to scale down to zero after 30 minutes of inactivity. This is to ensure that instances are kept warm to handle any sudden increase in traffic and to reduce costs for non-24/7 traffic or development environments. Unfortunately, there is no direct way to reduce this time as it is a part of the auto-scaling algorithm used by Databricks.


For your second question about deploying multiple models under a single endpoint, yes, Databricks does support this functionality. You can serve multiple models to a CPU serving endpoint that utilizes Databricks Model Serving. An endpoint can serve any registered Python MLflow model registered in the Model Registry. You can create a single endpoint with multiple models and set the endpoint traffic split between those models. For example, you can have one model (let's call it "current") that gets 90% of the endpoint traffic, while another model (let's call it "challenger") gets the remaining 10% of the traffic. You can also update the traffic split between served models as needed.

https://docs.databricks.com/en/machine-learning/model-serving/serve-multiple-models-to-serving-endpo...

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group