- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-15-2025 09:01 PM
Hi @gbhatia,
I’d need a few more details to fully understand your deployment, but in general, what can help is setting Compute type: CPU (cheaper and sufficient for testing), Compute scale-out: Small (0–4 concurrency, 0–4 DBU) since you don’t need high concurrency in DEV/UAT, and keeping Scale to zero disabled to avoid cold starts and have the endpoint always ready — noting that this increases costs slightly but makes testing much faster; for production, the recommended practice is to use larger instance sizes, more replicas, and only enable scale to zero for truly intermittent workloads.
https://docs.databricks.com/aws/en/machine-learning/model-serving/create-manage-serving-endpoints
Data Engineer | Machine Learning Engineer
LinkedIn: linkedin.com/in/wiliamrosa