Databricks Community

Idan · ‎02-07-2023

We are using Databricks over AWS infra, registering models on mlflow. We write our in-project imports as from src.(module location) import (objects).

Following examples online, I expected that when I use mlflow.pyfunc.log_model(...code_path=['PROJECT_ROOT/src'], ...), that would add the entire code tree to the model's running environment and thus allow us to keep our imports as-are.

When logging the model, I get a long list of [Errno 95] Operation not supported, one for each notebook in our repo. This blocks us from registering the model to mlflow.

We have used several ad-hoc solutions and workarounds, from forcing ourselves to work with all code in one file, to only working with files in the same directory (code_path=['./filename.py'], to adding specific libraries (and changing import paths accordingly), etc.

However, none of these is optimal. As a result we either duplicate code (killing DRY), or we put some imports inside the wrapper (i.e. those that cannot be run in our working environment since it's different from the one the model will experience when deployed), etc.

We have not yet tried to put all the notebooks (which we believe cause

[Errno 95] Operation not supported) in a separate folder. This will be highly disruptive to our current situation and processes, and we'd like to avoid that as much as we can.

Did anyone encounter a similar situation?

Thanks in advance

Anonymous · ‎04-09-2023

@Idan Reshef :

Yes, it's not uncommon to encounter issues with registering models on mlflow when using Databricks and importing code from other modules. One possible solution is to specify the code dependencies explicitly using the conda_env parameter in the mlflow.pyfunc.log_model method.

For example, you can create a conda environment YAML file (environment.yml) that lists all the required packages and dependencies, and specify the path to this file using the conda_env parameter:

import mlflow.pyfunc
 
# Define the path to the environment.yml file
conda_env = "path/to/environment.yml"
 
# Log the model, specifying the code path and the conda environment
mlflow.pyfunc.log_model(
    python_model=model,
    artifact_path="model",
    code_path=["src"],
    conda_env=conda_env
)

This will ensure that all the required packages are installed in the environment when the model is deployed, and that your code can import the necessary modules as usual.

Another option is to use the --extra-files option when deploying the model using mlflow. This allows you to specify additional files that should be included in the model's environment:

mlflow pyfunc deploy --model-uri model_uri --extra-files "src/**"

This will include all files in the src directory when deploying the model, ensuring that your code can import the necessary modules.

I hope this helps! Let me know if you have any further questions.

Anonymous · ‎04-10-2023

Hi @Idan Reshef

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!

Databricks Community

Using code_path in mlflow.pyfunc models on Databricks

Photos

Join Us as a Local Community Builder!

Announcing the APJ Databricks Smart Business Insights Challenge: Empowering Data-Driven Decision Mak

🚀 Monthly Databricks Get Started Days – Accelerate Your Learning Journey! 🚀

Business Intelligence in the Era of AI

Virtual Learning Festival: 9 April - 30 April

Data + AI Summit 2025 — registration now open!