how to include a third-party Maven package in MLflow model serving job cluster in Azure Databricks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-29-2021 02:11 PM
We try to use MLflow Model Serving, this service will enable realtime model serving behind a REST API interface; it will launch a single-node cluster that will host our model.
The issue happens when the single-node cluster try to get the environment ready base on a conda.yaml file that created when log the model using MLflow. But it looks like I can only specify a pip install but not a Maven package.
conda_env = _mlflow_conda_env(additional_conda_deps=None,
additional_pip_deps=["cloudpickle=={}".format(cloudpickle.version), "scikit-learn=={}".format(sklearn.version),"pyspark==3.0.0".format(pyspark.version))],additional_conda_channels=None,)how can I tell the cluster to install a maven jar file?
- Labels:
-
MlFlow
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-01-2021 10:45 AM
I don't believe you can do that at the moment. Is it required for a Python model? only Python-based models can really be served this way at the moment
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-14-2021 06:55 AM
Unfortunately we came across this same issue. We were trying to use MLFlow Serve to produce an API that could take text input and pass it through some NLP. In this instance we had installed a maven package on the cluster, so the experiment would run fine in a notebook, but MLFlow would fail as it couldn't install the maven package. As an alternative, it would help to be able to modify the job cluster that is provisioned to add additional libraries/packages that are required, that we can not specify in the conda definition.

