cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Using private python packages with databricks model serving

ericcbonet
New Contributor II

I am attempting to host a Python MLflow model using Databricks model serving. While the serving endpoint functions correctly without private Python packages, I am encountering difficulties when attempting to include them.

Context:

  • Without Private Packages: The serving endpoint works fine
  • With Private Packages: I can only use the `--index.url` set to my private PyPI server as detailed in this answer.

I wish to avoid storing my token for the private PyPI in plain text. Since [init scripts are not supported with model serving, I don't know how to inject the token, as a secret at build time. Could this be possible?

Attempted Solution:

Following this tutorial, I built the `whl` files, uploaded them to dbfs, and listed them in `pip_requirements` in `mlflow.pyfunc.log_model`. Unfortunately, the file on dbfs cannot be found at build time, preventing the endpoint creation.

Code:

Here's how I'm logging the model:

 

mlflow.pyfunc.log_model(
"hello-world",
python_model=model,
registered_model_name="hello-world",
signature=signature,
input_example=input_example,
pip_requirements=[
"/dbfs/FileStore/tables/private_package-0.1.10-py3-none-any.whl"
],
)

I have tried different paths in pip_requirements, and the file's existence on dbfs has been verified through both the Databricks CLI.

in `pip_requirements` I have tried:

- /dbfs/FileStore...
- dbfs/FileStore...
- /dbfs:/FileStore...
- dbfs:/FileStore...

Command to view package in databricks notebook:

dbutils.fs.ls("dbfs:/FileStore/tables/private_package-0.1.10-py3-none-any.whl")


Error:

The build logs produce the following error.

 

ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: '/dbfs/FileStore/tables/private_package-0.1.10-py3-none-any.whl'
CondaEnvException: Pip failed


My hypothesis is that there might be a permission error, and Databricks model hosting might not have access to dbfs. Being new to Databricks, I am unsure how to debug this. Any guidance or insights on how to resolve this issue would be greatly appreciated!

2 REPLIES 2

Kaniz_Fatma
Community Manager
Community Manager

Hi @ericcbonetThe error message indicates that no such file or directory is located at ’/dbfs/FileStore/tables/private_package-0.1.10-py3-none-any.whl'.

This error can occur when trying to install a package that does not exist in the specified directory.

To resolve this issue, you can try the following steps:

1. Check if the file exists in the specified directory by running %fs ls /dbfs/FileStore/tables/. If the file does not exist, you may need to upload it to that directory using the Databricks UI or CLI.

2. If the file exists in the specified directory, try installing the package again using %pip install /dbfs/FileStore/tables/private_package-0.1.10-py3-none-any.whl.

Sources:
https://docs.databricks.com/libraries/notebooks-python-libraries.html#install-a-private-package

Hi @Kaniz_Fatma thanks for getting back to me.

The link you attached is for installing private pip packages in a notebook. As mentioned in my question I can install my private package (that I uploaded to dbfs) in a notebook without issue. The problem I am having is installing this same package with model serving.

Running the command you gave me in a notebook results in a FileNotFoundException, while the directory is found with dbutils, see the screenshot below.

ericcbonet_0-1691310536630.png

I copy-pasted the file path from my databricks notebook to my python code and tried this a number of times with a number of different path combinations. I am always getting the same error 

ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory

Furthermore, even if I can debug the issue i.e. why can the model serving docker build environment not find the file on dbfs (which i suspect is permission related), I'm not super happy with this workflow,  having to update private python packages in dbfs and having to update the link in the pip_requirements argument of mlflow.pyfunc.log_model.

What would make this process much easier is if a secret could be picked up by the build environment and then could be injected into the `conda.yaml` file via an init script. For example

# conda.yaml
channels:
  - defaults
dependencies:
  - python=3.10
  - pip
  - pip:
      - mlflow>=2.5.0
      - boto3>=1.28.18
      - company-private>=0.1.10
      - --index-url "https://aws:%%CODE_ARTIFACT_TOKEN%%@company-0123456789.d.codeartifact.eu-central-1.amazonaws.com/pypi/company-python-packages/simple/"
name: mlflow-serving

Then a .sh init script could do the following

sed -i "s/%%CODE_ARTIFACT_TOKEN%%/${{ secrets.code-artifact-token }}/g" conda.yaml 

I realize model serving currently does not support init scripts, is this on the roadmap? Or can you suggest another workflow so I can use private python packages? 

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!