Databricks Community

BeardyMan · ‎09-14-2021

When using Azure Databricks and serving a model, we have received requests to capture additional logging. In some instances, they would like to capture input and output or even some of the steps from a pipeline.

Is there any way we can extend the logging with a MLFlow rest endpoint to capture additional required information?

ChenranLi · ‎09-14-2021

Here is an example of a custom model based on the sklearn model "GradientBoostingClassifier":

class CustomizedGradientBoostingClassifier(sklearn.ensemble.GradientBoostingClassifier):
  def __init__(self, random_state):
    super().__init__(random_state=random_state)
  
  def fit(self, X, y):
    super().fit(X, y)
  
  def predict_proba(self, X_test):
    return super().predict_proba(X_test)
  
  def predict(self, X):
    # Do customized tasks here (e.g. issueing an RPC calll to log the input and output)
    
    # For example, you can also return not only the predicted result, but also the input
    return (super().predict(X), X)

You can register the model as usual. When you invoke the REST endpoint, it does some custom things in the predict() function, and returns not only the predicted result, but also the input.

View solution in original post

Dan_Z · ‎09-14-2021

To my knowledge, if you write a custom model's predict() function, you can do any arbitrary operations in it (log inputs or outputs somewhere).

BeardyMan · ‎09-14-2021

Do you mean to use azure functions and custom python code to call the model and then perform the logging required rather than using the mlflow serve capability and the managed rest endpoint?

Dan_Z · ‎09-14-2021

My thought was:

Create a custom model with a predict function that does extra work (like logging)
Register the Model
Run the model in Model Serving

zainabs · ‎08-16-2024

hey Dan, we do that but in my case I dont see the logs in the event logs tab. where could they be?

BeardyMan · ‎09-14-2021

Thank you for the clarification, I understand what you mean now and that's exactly what I was hoping for! 🙂

ChenranLi · ‎09-14-2021

Here is an example of a custom model based on the sklearn model "GradientBoostingClassifier":

class CustomizedGradientBoostingClassifier(sklearn.ensemble.GradientBoostingClassifier):
  def __init__(self, random_state):
    super().__init__(random_state=random_state)
  
  def fit(self, X, y):
    super().fit(X, y)
  
  def predict_proba(self, X_test):
    return super().predict_proba(X_test)
  
  def predict(self, X):
    # Do customized tasks here (e.g. issueing an RPC calll to log the input and output)
    
    # For example, you can also return not only the predicted result, but also the input
    return (super().predict(X), X)

You can register the model as usual. When you invoke the REST endpoint, it does some custom things in the predict() function, and returns not only the predicted result, but also the input.

BeardyMan · ‎09-14-2021

Thank you @Chenran Li the example is exceedingly helpful. I will be sure to try this out!

Dan_Z · ‎09-14-2021

Another word from a Databricks employee:

"""

You can use the custom model approach but configuring it is painful. Plus you have ended every loggable model in the custom model. Another less intrusive solution would be to have a proxy server do the logging and then defer to MLflow model server. See very basic POC: https://github.com/amesar/mlflow-model-monitoring

Also check out Seldon Alibi for advanced monitoring.

""

BeardyMan · ‎09-15-2021

Thank you, Dan. We had originally suggested the route of using azure api manager or using an azure function as like an api wrapper to do the logging we want and the forwarding on the call to the mlfmow model serve rest endpoint. I was just wondering if there was a better alternative or something obvious we were missing.

Databricks Community

MLFlow Serve Logging

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! October 31 – November 06, 2025

Free Edition Hackathon

🚀 Announcing the Databricks Data Intelligence Platform Cheat Sheet

Zerobus Ingest in Action: How to Stream Event Data Directly into Your Lakehouse

Find Sensitive Data at Scale with Data Classification in Unity Catalog