cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Using code_path in mlflow.pyfunc models on Databricks

Idan
New Contributor II

We are using Databricks over AWS infra, registering models on mlflow. We write our in-project imports as from src.(module location) import (objects).

Following examples online, I expected that when I use mlflow.pyfunc.log_model(...code_path=['PROJECT_ROOT/src'], ...), that would add the entire code tree to the model's running environment and thus allow us to keep our imports as-are.

When logging the model, I get a long list of [Errno 95] Operation not supported, one for each notebook in our repo. This blocks us from registering the model to mlflow.

We have used several ad-hoc solutions and workarounds, from forcing ourselves to work with all code in one file, to only working with files in the same directory (code_path=['./filename.py'], to adding specific libraries (and changing import paths accordingly), etc.

However, none of these is optimal. As a result we either duplicate code (killing DRY), or we put some imports inside the wrapper (i.e. those that cannot be run in our working environment since it's different from the one the model will experience when deployed), etc.

We have not yet tried to put all the notebooks (which we believe cause 

[Errno 95] Operation not supported) in a separate folder. This will be highly disruptive to our current situation and processes, and we'd like to avoid that as much as we can.

Did anyone encounter a similar situation?

Thanks in advance

2 REPLIES 2

Anonymous
Not applicable

@Idan Reshef​ :

Yes, it's not uncommon to encounter issues with registering models on mlflow when using Databricks and importing code from other modules. One possible solution is to specify the code dependencies explicitly using the conda_env parameter in the mlflow.pyfunc.log_model method.

For example, you can create a conda environment YAML file (environment.yml) that lists all the required packages and dependencies, and specify the path to this file using the conda_env parameter:

import mlflow.pyfunc
 
# Define the path to the environment.yml file
conda_env = "path/to/environment.yml"
 
# Log the model, specifying the code path and the conda environment
mlflow.pyfunc.log_model(
    python_model=model,
    artifact_path="model",
    code_path=["src"],
    conda_env=conda_env
)

This will ensure that all the required packages are installed in the environment when the model is deployed, and that your code can import the necessary modules as usual.

Another option is to use the --extra-files option when deploying the model using mlflow. This allows you to specify additional files that should be included in the model's environment:

mlflow pyfunc deploy --model-uri model_uri --extra-files "src/**"

This will include all files in the src directory when deploying the model, ensuring that your code can import the necessary modules.

I hope this helps! Let me know if you have any further questions.

Anonymous
Not applicable

Hi @Idan Reshef​ 

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group