Databricks Community

rasgaard · ‎06-19-2024

Hi there 🙂

I have used the Databricks Model Serving Endpoints to serve a model which depends on some config files and a custom library. The library has been included by logging the model with the `code_path` argument in `mlflow.pyfunc.log_model` and it works perfectly fine. I wanted to do the same with the config files but I couldn't make out where exactly the MLflow model was copied to on the Model Serving Endpoint build containing the MLflow model.

After a bit of debugging I figured out that for local builds of the MLflow model using `mlflow models build-docker` the model files are copied to `/opt/ml/model/` where I imagined that Model Serving Endpoints also used that command under the hood. I was wrong in that assumption as the model files were saved in `/model/` on the Serving Endpoints build.

My question is then finally, how and/or where do I get insights into the build process of the Model Serving Endpoints builds? This placement of the files seems to be part of a custom Dockerfile that I can't seem to find any specification or documentation for. It would also be amazing if it was possible to have interactive access to the container that hosts the Serving Endpoint as that would have made debugging a whole lot easier.

Thanks in advance 🙂

robbe · ‎06-20-2024

Hi @rasgaard, one way to achieve that without inspecting the container is to use MLflow artifacts. Artifacts allow you to log files together with your models and reference them inside the endpoint.

For example, let's assume that you need to include a YAML config file that controls the preprocessing of your model's inputs in the endpoint. In your training script you have:

artifacts = {
    "model_path": model_path,  # Path to the serialised model file
    "pipeline_config_path": pipeline_config_path,  # Path to the preprocessing config file
}

model_info = mlflow.pyfunc.log_model(
    artifact_path="model",
    python_model=ModelWrapper(),
    artifacts=artifacts,
    code_path=[<your-code-path>],
    ...  # Additional arguments to model logging
)

And in the ModelWrapper class:

import mlflow
from mlflow.pyfunc import PythonModel

from your_code import preprocess_inputs


class ModelWrapper(PythonModel):
    """Wrapper around the model class.

    It allows the custom model to be registered as a customised MLflow models with the
    “python_function” (“pyfunc”) flavor, leveraging custom inference logic and artifact
    dependencies.
    """

    def __init__(self) -> None:
        """Initialise the wrapper."""
        self.model = None
        self.pipeline_config = None

    def load_context(self, context: mlflow.pyfunc.PythonModelContext) -> None:
        """Load the model from the context.

        Args:
            context (PythonModelContext): Instance containing artifacts that the model
                can use to perform inference.
        """
        from joblib import load

        self.model = load(context.artifacts["model_path"])

        pipeline_config_path = context.artifacts.get("pipeline_config_path")
        with open(pipeline_config_path) as f:
            self.pipeline_config = yaml.safe_load(f)  # We assume that pipeline_config is a dict

    def predict(self, context: mlflow.pyfunc.PythonModelContext, model_input):
        """Make predictions using the wrapper.

        Args:
            context (PythonModelContext): Instance containing artifacts that the model
                can use to perform inference.
            model_input: Model inputs for which to generate predictions.

        Returns:
            The predictions of the estimator on the inputs.
        """
        inputs = preprocess_inputs(model_inputs, **self.pipeline_config)
        return self.model.predict(inputs)

You can find more info here: https://mlflow.org/docs/latest/python_api/mlflow.pyfunc.html#creating-custom-pyfunc-models

Hopefully this is of help, let me know!