cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Options sporadic (and cost-efficient) Model Serving on Databricks?

cbossi
New Contributor II

Hi all,

I'm new to Databricks so would appreciate some advice.

I have a ML model deployed using Databricks Model Serving. My use case is very sporadic: I only need to make 5–15 prediction requests per day (industrial application), and there can be long idle periods between requests. I’ve noticed that after a cold start, the serving cluster stays up for at least 30 minutes (the minimum idle timeout), and I am billed for this entire period, even if no further requests are made.

Is there any way to serve models on Databricks where I only pay for actual requests (compute time), and not for idle time? Or are there recommended alternatives, perhaps via integration with other Azure services?

Thanks for any advice!

1 ACCEPTED SOLUTION

Accepted Solutions

KaushalVachhani
Databricks Employee
Databricks Employee

Hi @cbossi , You are right!

A 30-minute idle period precedes the endpoint's scaling down. You are billed for the compute resources used during this period, plus the actual serving time when requests are made. This is the current expected behaviour. You cannot currently reduce the idle timeout to less than 30 minutes.

If your use case does not require real-time request prediction, it is better to use a batch prediction by accumulating requests throughout the day and running them all at once. Alternatively, you can explore Azure Functions to host the model. 

View solution in original post

1 REPLY 1

KaushalVachhani
Databricks Employee
Databricks Employee

Hi @cbossi , You are right!

A 30-minute idle period precedes the endpoint's scaling down. You are billed for the compute resources used during this period, plus the actual serving time when requests are made. This is the current expected behaviour. You cannot currently reduce the idle timeout to less than 30 minutes.

If your use case does not require real-time request prediction, it is better to use a batch prediction by accumulating requests throughout the day and running them all at once. Alternatively, you can explore Azure Functions to host the model.