WiliamRosa
Databricks Partner

Hi @gbhatia,

I’d need a few more details to fully understand your deployment, but in general, what can help is setting Compute type: CPU (cheaper and sufficient for testing), Compute scale-out: Small (0–4 concurrency, 0–4 DBU) since you don’t need high concurrency in DEV/UAT, and keeping Scale to zero disabled to avoid cold starts and have the endpoint always ready — noting that this increases costs slightly but makes testing much faster; for production, the recommended practice is to use larger instance sizes, more replicas, and only enable scale to zero for truly intermittent workloads.
https://docs.databricks.com/aws/en/machine-learning/model-serving/create-manage-serving-endpoints

WiliamRosa_0-1757995272360.png

 




Wiliam Rosa
Data Engineer | Machine Learning Engineer
LinkedIn: linkedin.com/in/wiliamrosa