09-14-2021 06:59 AM
When using Azure Databricks and serving a model, we have received requests to capture additional logging. In some instances, they would like to capture input and output or even some of the steps from a pipeline.
Is there any way we can extend the logging with a MLFlow rest endpoint to capture additional required information?
09-14-2021 02:37 PM
Here is an example of a custom model based on the sklearn model "GradientBoostingClassifier":
class CustomizedGradientBoostingClassifier(sklearn.ensemble.GradientBoostingClassifier):
def __init__(self, random_state):
super().__init__(random_state=random_state)
def fit(self, X, y):
super().fit(X, y)
def predict_proba(self, X_test):
return super().predict_proba(X_test)
def predict(self, X):
# Do customized tasks here (e.g. issueing an RPC calll to log the input and output)
# For example, you can also return not only the predicted result, but also the input
return (super().predict(X), X)
You can register the model as usual. When you invoke the REST endpoint, it does some custom things in the predict() function, and returns not only the predicted result, but also the input.
09-14-2021 11:19 AM
To my knowledge, if you write a custom model's predict() function, you can do any arbitrary operations in it (log inputs or outputs somewhere).
09-14-2021 12:32 PM
Do you mean to use azure functions and custom python code to call the model and then perform the logging required rather than using the mlflow serve capability and the managed rest endpoint?
09-14-2021 02:05 PM
My thought was:
08-16-2024 10:36 AM
hey Dan, we do that but in my case I dont see the logs in the event logs tab. where could they be?
09-14-2021 03:46 PM
Thank you for the clarification, I understand what you mean now and that's exactly what I was hoping for! 🙂
09-14-2021 02:37 PM
Here is an example of a custom model based on the sklearn model "GradientBoostingClassifier":
class CustomizedGradientBoostingClassifier(sklearn.ensemble.GradientBoostingClassifier):
def __init__(self, random_state):
super().__init__(random_state=random_state)
def fit(self, X, y):
super().fit(X, y)
def predict_proba(self, X_test):
return super().predict_proba(X_test)
def predict(self, X):
# Do customized tasks here (e.g. issueing an RPC calll to log the input and output)
# For example, you can also return not only the predicted result, but also the input
return (super().predict(X), X)
You can register the model as usual. When you invoke the REST endpoint, it does some custom things in the predict() function, and returns not only the predicted result, but also the input.
09-14-2021 03:46 PM
Thank you @Chenran Li the example is exceedingly helpful. I will be sure to try this out!
09-14-2021 06:14 PM
Another word from a Databricks employee:
"""
You can use the custom model approach but configuring it is painful. Plus you have ended every loggable model in the custom model. Another less intrusive solution would be to have a proxy server do the logging and then defer to MLflow model server. See very basic POC: https://github.com/amesar/mlflow-model-monitoring
Also check out Seldon Alibi for advanced monitoring.
""
09-15-2021 12:54 PM
Thank you, Dan. We had originally suggested the route of using azure api manager or using an azure function as like an api wrapper to do the logging we want and the forwarding on the call to the mlfmow model serve rest endpoint. I was just wondering if there was a better alternative or something obvious we were missing.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group