cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

MLFlow Serve Logging

BeardyMan
New Contributor III

When using Azure Databricks and serving a model, we have received requests to capture additional logging. In some instances, they would like to capture input and output or even some of the steps from a pipeline.

Is there any way we can extend the logging with a MLFlow rest endpoint to capture additional required information?

1 ACCEPTED SOLUTION

Accepted Solutions

ChenranLi
New Contributor III

Here is an example of a custom model based on the sklearn model "GradientBoostingClassifier":

class CustomizedGradientBoostingClassifier(sklearn.ensemble.GradientBoostingClassifier):
  def __init__(self, random_state):
    super().__init__(random_state=random_state)
  
  def fit(self, X, y):
    super().fit(X, y)
  
  def predict_proba(self, X_test):
    return super().predict_proba(X_test)
  
  def predict(self, X):
    # Do customized tasks here (e.g. issueing an RPC calll to log the input and output)
    
    # For example, you can also return not only the predicted result, but also the input
    return (super().predict(X), X)

You can register the model as usual. When you invoke the REST endpoint, it does some custom things in the predict() function, and returns not only the predicted result, but also the input.

View solution in original post

9 REPLIES 9

Dan_Z
Databricks Employee
Databricks Employee

To my knowledge, if you write a custom model's predict() function, you can do any arbitrary operations in it (log inputs or outputs somewhere).

BeardyMan
New Contributor III

Do you mean to use azure functions and custom python code to call the model and then perform the logging required rather than using the mlflow serve capability and the managed rest endpoint? ​

Dan_Z
Databricks Employee
Databricks Employee

My thought was:

  1. Create a custom model with a predict function that does extra work (like logging)
  2. Register the Model
  3. Run the model in Model Serving

zainabs
New Contributor II

hey Dan, we do that but in my case I dont see the logs in the event logs tab. where could they be?

BeardyMan
New Contributor III

Thank you for the clarification, I understand what you mean now and that's exactly what I was hoping for! 🙂

ChenranLi
New Contributor III

Here is an example of a custom model based on the sklearn model "GradientBoostingClassifier":

class CustomizedGradientBoostingClassifier(sklearn.ensemble.GradientBoostingClassifier):
  def __init__(self, random_state):
    super().__init__(random_state=random_state)
  
  def fit(self, X, y):
    super().fit(X, y)
  
  def predict_proba(self, X_test):
    return super().predict_proba(X_test)
  
  def predict(self, X):
    # Do customized tasks here (e.g. issueing an RPC calll to log the input and output)
    
    # For example, you can also return not only the predicted result, but also the input
    return (super().predict(X), X)

You can register the model as usual. When you invoke the REST endpoint, it does some custom things in the predict() function, and returns not only the predicted result, but also the input.

BeardyMan
New Contributor III

Thank you @Chenran Li​  the example is exceedingly helpful. I will be sure to try this out!

Dan_Z
Databricks Employee
Databricks Employee

Another word from a Databricks employee:

"""

You can use the custom model approach but configuring it is painful. Plus you have ended every loggable model in the custom model. Another less intrusive solution would be to have a proxy server do the logging and then defer to MLflow model server. See very basic POC: https://github.com/amesar/mlflow-model-monitoring

Also check out Seldon Alibi for advanced monitoring.

""

BeardyMan
New Contributor III

Thank you, Dan. We had originally suggested the route of using azure api manager or using an azure function as like an api wrapper to do the logging we want and the forwarding on the call to the mlfmow model serve rest endpoint. I was just wondering if there was a better alternative or something obvious we were missing. ​

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group