Hi all,
I'm new to Databricks so would appreciate some advice.
I have a ML model deployed using Databricks Model Serving. My use case is very sporadic: I only need to make 5–15 prediction requests per day (industrial application), and there can be long idle periods between requests. I’ve noticed that after a cold start, the serving cluster stays up for at least 30 minutes (the minimum idle timeout), and I am billed for this entire period, even if no further requests are made.
Is there any way to serve models on Databricks where I only pay for actual requests (compute time), and not for idle time? Or are there recommended alternatives, perhaps via integration with other Azure services?
Thanks for any advice!