cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How do I invoke a data enrichment function before model.predict while serving the model

bhawik21
New Contributor II

I have used mlflow and got my model served through REST API. It work fine when all model features are provided. But my use case is that only a single feature (the primary key) will be provided by the consumer application, and my code has to lookup the other features from a database based on that key and then use the model.predict to return the prediction. I tried researching but understood that the REST endpoints will simply invoke the model.predict function. How can I make it invoke a data massaging function before predicting?

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

you want a machine learning pipeline, part of so called MLOps.

You can find a lot online, like here f.e.

View solution in original post

4 REPLIES 4

-werners-
Esteemed Contributor III

you want a machine learning pipeline, part of so called MLOps.

You can find a lot online, like here f.e.

bhawik21
New Contributor II

Thanks @Werner Stinckens​ , while pipeline can condense data prep into an abstraction, the usecase in question was about invocation of a data massaging function within the model's predict function.

Now that I solved it, I like to share with the community. I went about designing the solution by wrapping the model's predict function with a wrapper class's predict method. Since this is non-standard model, I used the pyfunc flavor to log the model in mlflow. By specifying the conda environment, the model hosting cluster can install those libraries and this then runs our custom model.

The main design aspect here is that we replace the model's native predict function with a user defined one. This can do anything like reading, preparing data and then go on to call the model's native predict method.

Databricks has a newish functionality called Feature Store that can handle this kind of usecases. It has APIs that natively invoke a Feature Table lookup and run predict on the values fetched.

-werners-
Esteemed Contributor III

nice!

LuisL
New Contributor II

You can create a custom endpoint for your REST API that handles the data massaging before calling the

model.predict function. This endpoint can take in the primary key as an input, retrieve the additional features from the database based on that key, and then pass the complete set of features to the

model.predict function.

You can use a web framework like Flask or FastAPI to create the custom endpoint. For example, you can create a function that retrieves the additional features like lead enrichment from the database and calls the model.predict

function, and then use this function as a route in your Flask or FastAPI app. The client application can then send a request to this custom endpoint with the primary key, and the endpoint will return the prediction based on the retrieved features.

You can also use mlflow's Model.call() method to invoke your custom function.

You can also use Serverless Framework or other similar tools to deploy this function and expose it through an API Gateway

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group