Databricks

BeardyMan · ‎09-14-2021

When using Azure Databricks and serving a model, we have received requests to capture additional logging. In some instances, they would like to capture input and output or even some of the steps from a pipeline.

Is there any way we can extend the logging with a MLFlow rest endpoint to capture additional required information?

ChenranLi · ‎09-14-2021

Here is an example of a custom model based on the sklearn model "GradientBoostingClassifier":

class CustomizedGradientBoostingClassifier(sklearn.ensemble.GradientBoostingClassifier):
  def __init__(self, random_state):
    super().__init__(random_state=random_state)
  
  def fit(self, X, y):
    super().fit(X, y)
  
  def predict_proba(self, X_test):
    return super().predict_proba(X_test)
  
  def predict(self, X):
    # Do customized tasks here (e.g. issueing an RPC calll to log the input and output)
    
    # For example, you can also return not only the predicted result, but also the input
    return (super().predict(X), X)

You can register the model as usual. When you invoke the REST endpoint, it does some custom things in the predict() function, and returns not only the predicted result, but also the input.

View solution in original post

Kaniz · ‎09-14-2021

Hi @ BeardyMan! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the community have an answer to your question first. Or else I will follow up shortly with a response.

Dan_Z · ‎09-14-2021

To my knowledge, if you write a custom model's predict() function, you can do any arbitrary operations in it (log inputs or outputs somewhere).

BeardyMan · ‎09-14-2021

Do you mean to use azure functions and custom python code to call the model and then perform the logging required rather than using the mlflow serve capability and the managed rest endpoint?

Dan_Z · ‎09-14-2021

My thought was:

Create a custom model with a predict function that does extra work (like logging)
Register the Model
Run the model in Model Serving

BeardyMan · ‎09-14-2021

Thank you for the clarification, I understand what you mean now and that's exactly what I was hoping for! 🙂

ChenranLi · ‎09-14-2021

Here is an example of a custom model based on the sklearn model "GradientBoostingClassifier":

class CustomizedGradientBoostingClassifier(sklearn.ensemble.GradientBoostingClassifier):
  def __init__(self, random_state):
    super().__init__(random_state=random_state)
  
  def fit(self, X, y):
    super().fit(X, y)
  
  def predict_proba(self, X_test):
    return super().predict_proba(X_test)
  
  def predict(self, X):
    # Do customized tasks here (e.g. issueing an RPC calll to log the input and output)
    
    # For example, you can also return not only the predicted result, but also the input
    return (super().predict(X), X)

You can register the model as usual. When you invoke the REST endpoint, it does some custom things in the predict() function, and returns not only the predicted result, but also the input.

BeardyMan · ‎09-14-2021

Thank you @Chenran Li the example is exceedingly helpful. I will be sure to try this out!

Dan_Z · ‎09-14-2021

Another word from a Databricks employee:

"""

You can use the custom model approach but configuring it is painful. Plus you have ended every loggable model in the custom model. Another less intrusive solution would be to have a proxy server do the logging and then defer to MLflow model server. See very basic POC: https://github.com/amesar/mlflow-model-monitoring

Also check out Seldon Alibi for advanced monitoring.

""

BeardyMan · ‎09-15-2021

Thank you, Dan. We had originally suggested the route of using azure api manager or using an azure function as like an api wrapper to do the logging we want and the forwarding on the call to the mlfmow model serve rest endpoint. I was just wondering if there was a better alternative or something obvious we were missing.

Databricks

MLFlow Serve Logging

Registration now open! Databricks Data + AI Summit 2024

Meet DBRX, the New Standard for High-Quality LLMs

Data Warehousing in the Era of AI