cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Using private python packages with databricks model serving

ericcbonet
New Contributor II

I am attempting to host a Python MLflow model using Databricks model serving. While the serving endpoint functions correctly without private Python packages, I am encountering difficulties when attempting to include them.

Context:

  • Without Private Packages: The serving endpoint works fine
  • With Private Packages: I can only use the `--index.url` set to my private PyPI server as detailed in this answer.

I wish to avoid storing my token for the private PyPI in plain text. Since [init scripts are not supported with model serving, I don't know how to inject the token, as a secret at build time. Could this be possible?

Attempted Solution:

Following this tutorial, I built the `whl` files, uploaded them to dbfs, and listed them in `pip_requirements` in `mlflow.pyfunc.log_model`. Unfortunately, the file on dbfs cannot be found at build time, preventing the endpoint creation.

Code:

Here's how I'm logging the model:

 

mlflow.pyfunc.log_model(
"hello-world",
python_model=model,
registered_model_name="hello-world",
signature=signature,
input_example=input_example,
pip_requirements=[
"/dbfs/FileStore/tables/private_package-0.1.10-py3-none-any.whl"
],
)

I have tried different paths in pip_requirements, and the file's existence on dbfs has been verified through both the Databricks CLI.

in `pip_requirements` I have tried:

- /dbfs/FileStore...
- dbfs/FileStore...
- /dbfs:/FileStore...
- dbfs:/FileStore...

Command to view package in databricks notebook:

dbutils.fs.ls("dbfs:/FileStore/tables/private_package-0.1.10-py3-none-any.whl")


Error:

The build logs produce the following error.

 

ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: '/dbfs/FileStore/tables/private_package-0.1.10-py3-none-any.whl'
CondaEnvException: Pip failed


My hypothesis is that there might be a permission error, and Databricks model hosting might not have access to dbfs. Being new to Databricks, I am unsure how to debug this. Any guidance or insights on how to resolve this issue would be greatly appreciated!

1 REPLY 1

Hi @Retired_mod thanks for getting back to me.

The link you attached is for installing private pip packages in a notebook. As mentioned in my question I can install my private package (that I uploaded to dbfs) in a notebook without issue. The problem I am having is installing this same package with model serving.

Running the command you gave me in a notebook results in a FileNotFoundException, while the directory is found with dbutils, see the screenshot below.

ericcbonet_0-1691310536630.png

I copy-pasted the file path from my databricks notebook to my python code and tried this a number of times with a number of different path combinations. I am always getting the same error 

ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory

Furthermore, even if I can debug the issue i.e. why can the model serving docker build environment not find the file on dbfs (which i suspect is permission related), I'm not super happy with this workflow,  having to update private python packages in dbfs and having to update the link in the pip_requirements argument of mlflow.pyfunc.log_model.

What would make this process much easier is if a secret could be picked up by the build environment and then could be injected into the `conda.yaml` file via an init script. For example

# conda.yaml
channels:
  - defaults
dependencies:
  - python=3.10
  - pip
  - pip:
      - mlflow>=2.5.0
      - boto3>=1.28.18
      - company-private>=0.1.10
      - --index-url "https://aws:%%CODE_ARTIFACT_TOKEN%%@company-0123456789.d.codeartifact.eu-central-1.amazonaws.com/pypi/company-python-packages/simple/"
name: mlflow-serving

Then a .sh init script could do the following

sed -i "s/%%CODE_ARTIFACT_TOKEN%%/${{ secrets.code-artifact-token }}/g" conda.yaml 

I realize model serving currently does not support init scripts, is this on the roadmap? Or can you suggest another workflow so I can use private python packages? 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group