Databricks Community

ericcbonet · ‎08-03-2023

I am attempting to host a Python MLflow model using Databricks model serving. While the serving endpoint functions correctly without private Python packages, I am encountering difficulties when attempting to include them.

Context:

Without Private Packages: The serving endpoint works fine
With Private Packages: I can only use the `--index.url` set to my private PyPI server as detailed in this answer.

I wish to avoid storing my token for the private PyPI in plain text. Since [init scripts are not supported with model serving, I don't know how to inject the token, as a secret at build time. Could this be possible?

Attempted Solution:

Following this tutorial, I built the `whl` files, uploaded them to dbfs, and listed them in `pip_requirements` in `mlflow.pyfunc.log_model`. Unfortunately, the file on dbfs cannot be found at build time, preventing the endpoint creation.

Code:

Here's how I'm logging the model:

mlflow.pyfunc.log_model(
"hello-world",
python_model=model,
registered_model_name="hello-world",
signature=signature,
input_example=input_example,
pip_requirements=[
"/dbfs/FileStore/tables/private_package-0.1.10-py3-none-any.whl"
],
)

I have tried different paths in pip_requirements, and the file's existence on dbfs has been verified through both the Databricks CLI.

in `pip_requirements` I have tried:

- /dbfs/FileStore...
- dbfs/FileStore...
- /dbfs:/FileStore...
- dbfs:/FileStore...

Command to view package in databricks notebook:

dbutils.fs.ls("dbfs:/FileStore/tables/private_package-0.1.10-py3-none-any.whl")

Error:

The build logs produce the following error.

ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: '/dbfs/FileStore/tables/private_package-0.1.10-py3-none-any.whl'
CondaEnvException: Pip failed

My hypothesis is that there might be a permission error, and Databricks model hosting might not have access to dbfs. Being new to Databricks, I am unsure how to debug this. Any guidance or insights on how to resolve this issue would be greatly appreciated!

ericcbonet · ‎08-06-2023

Hi @Retired_mod thanks for getting back to me.

The link you attached is for installing private pip packages in a notebook. As mentioned in my question I can install my private package (that I uploaded to dbfs) in a notebook without issue. The problem I am having is installing this same package with model serving.

Running the command you gave me in a notebook results in a FileNotFoundException, while the directory is found with dbutils, see the screenshot below.

I copy-pasted the file path from my databricks notebook to my python code and tried this a number of times with a number of different path combinations. I am always getting the same error

ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory

Furthermore, even if I can debug the issue i.e. why can the model serving docker build environment not find the file on dbfs (which i suspect is permission related), I'm not super happy with this workflow, having to update private python packages in dbfs and having to update the link in the pip_requirements argument of mlflow.pyfunc.log_model.

What would make this process much easier is if a secret could be picked up by the build environment and then could be injected into the `conda.yaml` file via an init script. For example

# conda.yaml
channels:
  - defaults
dependencies:
  - python=3.10
  - pip
  - pip:
      - mlflow>=2.5.0
      - boto3>=1.28.18
      - company-private>=0.1.10
      - --index-url "https://aws:%%CODE_ARTIFACT_TOKEN%%@company-0123456789.d.codeartifact.eu-central-1.amazonaws.com/pypi/company-python-packages/simple/"
name: mlflow-serving

Then a .sh init script could do the following

sed -i "s/%%CODE_ARTIFACT_TOKEN%%/${{ secrets.code-artifact-token }}/g" conda.yaml

I realize model serving currently does not support init scripts, is this on the roadmap? Or can you suggest another workflow so I can use private python packages?