Options sporadic (and cost-efficient) Model Serving on Databricks?

cbossi — Wed, 05 Nov 2025 12:47:18 GMT

Hi all,

I'm new to Databricks so would appreciate some advice.

I have a ML model deployed using Databricks Model Serving. My use case is very sporadic: I only need to make 5–15 prediction requests per day (industrial application), and there can be long idle periods between requests. I’ve noticed that after a cold start, the serving cluster stays up for at least 30 minutes (the minimum idle timeout), and I am billed for this entire period, even if no further requests are made.

Is there any way to serve models on Databricks where I only pay for actual requests (compute time), and not for idle time? Or are there recommended alternatives, perhaps via integration with other Azure services?

Thanks for any advice!

Re: Options sporadic (and cost-efficient) Model Serving on Databricks?

KaushalVachhani — Wed, 05 Nov 2025 13:29:26 GMT

Hi @cbossi , You are right!

A 30-minute idle period precedes the endpoint's scaling down. You are billed for the compute resources used during this period, plus the actual serving time when requests are made. This is the current expected behaviour. You cannot currently reduce the idle timeout to less than 30 minutes.

If your use case does not require real-time request prediction, it is better to use a batch prediction by accumulating requests throughout the day and running them all at once. Alternatively, you can explore Azure Functions to host the model.

topic Options sporadic (and cost-efficient) Model Serving on Databricks? in Machine Learning

Options sporadic (and cost-efficient) Model Serving on Databricks?

Re: Options sporadic (and cost-efficient) Model Serving on Databricks?