topic how to include a third-party Maven package in MLflow model serving job cluster in Azure Databricks in Data Engineering

how to include a third-party Maven package in MLflow model serving job cluster in Azure Databricks

Celia — Thu, 29 Jul 2021 21:11:45 GMT

We try to use MLflow Model Serving, this service will enable realtime model serving behind a REST API interface; it will launch a single-node cluster that will host our model.

The issue happens when the single-node cluster try to get the environment ready base on a conda.yaml file that created when log the model using MLflow. But it looks like I can only specify a pip install but not a Maven package.

conda_env = _mlflow_conda_env(

additional_conda_deps=None,

additional_pip_deps=["cloudpickle=={}".format(cloudpickle.version), "scikit-learn=={}".format(sklearn.version),"pyspark==3.0.0".format(pyspark.version))],

additional_conda_channels=None,

)

how can I tell the cluster to install a maven jar file?

Re: how to include a third-party Maven package in MLflow model serving job cluster in Azure Databricks

sean_owen — Wed, 01 Sep 2021 17:45:16 GMT

I don't believe you can do that at the moment. Is it required for a Python model? only Python-based models can really be served this way at the moment

Re: how to include a third-party Maven package in MLflow model serving job cluster in Azure Databricks

BeardyMan — Tue, 14 Sep 2021 13:55:21 GMT

Unfortunately we came across this same issue. We were trying to use MLFlow Serve to produce an API that could take text input and pass it through some NLP. In this instance we had installed a maven package on the cluster, so the experiment would run fine in a notebook, but MLFlow would fail as it couldn't install the maven package. As an alternative, it would help to be able to modify the job cluster that is provisioned to add additional libraries/packages that are required, that we can not specify in the conda definition.