How to serve a Unity Catalog ML model to external usage

johndoe99012
New Contributor II

Hello everyone

 

I am following this notebook tutorial

 

https://docs.databricks.com/en/machine-learning/manage-model-lifecycle/index.html#example-notebook

 

Now I can register a machine learning model in Unity Catalog, but the tutorial only shows how to use that from databricks inside. 

 

How could I deploy the model for external usage, for instance, to call the model via an API from my computer (similar to AWS API Gateway)?

 

Many thanks

 

filipniziol
Esteemed Contributor

 

Hi @johndoe99012 ,

If you want to make a registered Databricks model accessible via external calls—similar to how you might use AWS API Gateway—you can leverage Databricks Model Serving. This feature allows you to host a model as a REST endpoint and interact with it from outside Databricks.

filipniziol_1-1734348633833.png

Key Steps:

  1. Enable Model Serving:
    Ensure that your workspace is in a region where Model Serving is supported. You can find the list of supported regions here:
    Azure Databricks Feature & Region Support – Model Serving

  2. Deploy Your Model to a Serving Endpoint:
    Once your model is registered in Unity Catalog, you can create a serving endpoint from the Databricks UI or via the CLI.
    Databricks Documentation on Model Serving

  3. Obtain the Endpoint URL and Authentication Credentials:
    After the serving endpoint is live, Databricks provides a REST URL. You can call this URL from your external applications using standard HTTP requests. You'll need to include the appropriate authentication token.

filipniziol_2-1734348704381.png

 

Thank you for your great answer. I am trying to understand the pricing of Mosaic pricing schema and I found this URL: https://www.databricks.com/product/pricing/model-serving.

 

If I understand correctly, Mosaic will charge me similar to Lambda in AWS, i.e. I pay as I call Mosaic API.

Hi @johndoe99012 ,

You pay for DBU-s.

If you select "scale to zero", you do not pay anything, but at the same time the endpoint is not started.
So the first HTTP request sent will take a while and this is not recommended for production.

If you select Small compute without scaling to zero you will be using 1-4 DBUs,
So when there are no requests or there are limited number of requests you will pay for 1 DBUs, so 0.07 USD as per documentation. If there are many requests and the endpoint is scaled-out to its max 4 DBUs, you will pay 4 x 0.07 USD = 0.28 USD per hour.

filipniziol_0-1734349406586.png

 

filipniziol
Esteemed Contributor

Hi @johndoe99012 

If the answer resolved your question, please consider marking it as the solution.

It helps others in the community find answers more easily.