bhawik21
New Contributor II

Thanks @Werner Stinckens​ , while pipeline can condense data prep into an abstraction, the usecase in question was about invocation of a data massaging function within the model's predict function.

Now that I solved it, I like to share with the community. I went about designing the solution by wrapping the model's predict function with a wrapper class's predict method. Since this is non-standard model, I used the pyfunc flavor to log the model in mlflow. By specifying the conda environment, the model hosting cluster can install those libraries and this then runs our custom model.

The main design aspect here is that we replace the model's native predict function with a user defined one. This can do anything like reading, preparing data and then go on to call the model's native predict method.

Databricks has a newish functionality called Feature Store that can handle this kind of usecases. It has APIs that natively invoke a Feature Table lookup and run predict on the values fetched.