12-10-2025 12:19 PM
I'm a software engineer and a bit new to databricks. My goal is to create a model serving endpoint, that interfaces with several ML models. Traditionally this would look like:
API--> Service --> Data
Now using databricks, my understanding is that it will look like
Models Serving Endpoint --> Service Model --> ML Model
From a best practices perspective what is the best way to deploy? A single dab that bundles the resources to a single cluster? Multiple deployed models/clusters in more of a micro service fashion?
Also is the service model even necessary?
I can see benefits to each method. I'm certain there are aspects I'm overlooking. I'd love to hear how others are deploying
12-10-2025 07:27 PM
Hi @DBXDeveloper111 ,
A Model Serving endpoint is the “service”: it exposes a REST API and handles autoscaling on serverless compute. You don’t manage clusters for online inference. Each endpoint hosts one or more served entities (models/functions), which you reference and route to by name and version. You configure these in the endpoint’s served_entities section (via UI, REST, SDK, or MLflow Deployments). A separate “service model” is not required. Pre/post‑processing can live inside the model wrapper (MLflow pyfunc) or as a function/agent deployed to Model Serving if you need to orchestrate multiple backends.
You only need a separate service layer if you’re coordinating multiple models/tools or enforcing cross‑cutting policies that don’t fit neatly in one model’s code. In that case, deploy an orchestrator function/agent to Model Serving and keep the client contract stable.
12-10-2025 07:27 PM
Hi @DBXDeveloper111 ,
A Model Serving endpoint is the “service”: it exposes a REST API and handles autoscaling on serverless compute. You don’t manage clusters for online inference. Each endpoint hosts one or more served entities (models/functions), which you reference and route to by name and version. You configure these in the endpoint’s served_entities section (via UI, REST, SDK, or MLflow Deployments). A separate “service model” is not required. Pre/post‑processing can live inside the model wrapper (MLflow pyfunc) or as a function/agent deployed to Model Serving if you need to orchestrate multiple backends.
You only need a separate service layer if you’re coordinating multiple models/tools or enforcing cross‑cutting policies that don’t fit neatly in one model’s code. In that case, deploy an orchestrator function/agent to Model Serving and keep the client contract stable.
12-11-2025 07:15 AM
It sounds like I need to create the "service wrapper" that will do the pre-processing and fetching of env vars etc. I'll deploy that using a model serving endpoint, serverless for speed, then each sub model will be on its own compute cluster that scales independently.
Thanks for the great feedback
12-11-2025 05:28 AM
Just register model and then deploy service endpoint to serve this model.