cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Error when creating model env using 'virtualenv' with DBR 14.3

drjb1010
New Contributor

We were trying to inference from a logged model but had the following errorScreen Shot 2025-02-05 at 10.05.12 AM.png

Previously, we had been using `conda` as the environment manager, but that is no longer supported. I tried to update pyenv as some suggested but didn't get anywhere. Any insights for fixing this issue would be appreciated!

2 REPLIES 2

alonisser
Contributor II

Happened to me too, and pyenv update command is missing , any workarounds here? 

Louis_Frolio
Databricks Employee
Databricks Employee

Hello @drjb1010 , 

This is a known issue with DBR 14.3 where the `virtualenv` environment manager fails because it depends on `pyenv` to install specific Python versions, but `pyenv` is either not installed or not properly configured in the runtime environment.

## Understanding the Problem

The error occurs because when you specify `env_manager="virtualenv"`, MLflow attempts to create an isolated Python environment matching your model's training environment. It tries to use `pyenv` to install Python 3.9.19, but the command fails with exit code 2, indicating that either:

- `pyenv` is not properly installed in DBR 14.3
- The Python version (3.9.19) cannot be installed via pyenv
- Required dependencies for building Python from source are missing

The transition away from `conda` as an environment manager has left `virtualenv` as an option, but it has dependencies that aren't fully satisfied in DBR 14.3.

Recommended Solution

Use `env_manager="local"` instead of `env_manager="virtualenv"`:

```python
model_udf_score = mlflow.pyfunc.spark_udf(
spark,
model_version_uri,
env_manager="local", # Change from "virtualenv" to "local"
params={"predict_method": "predict_score"}
)
```

What This Means

When using `env_manager="local"`:

- The model will use the cluster's existing Python environment
- No isolated environment creation occurs
- Dependencies must already be installed on the cluster
- You lose the environment isolation benefit but gain stability

Ensuring Dependencies Are Met

Since you're using the local environment, make sure your cluster has the required dependencies installed:

Option 1: Install via notebook
```python
%pip install -r /path/to/requirements.txt
```

Option 2: Cluster Libraries
Install the required libraries directly on the cluster through the Databricks UI under cluster configuration.

Option 3: Init Scripts
Create an init script to install dependencies when the cluster starts.

Alternative Approach

If you absolutely need environment isolation, consider:

Pre-installing dependencies: Before loading the model, manually install all required packages that match your model's dependencies using `%pip install`.

Use Model Serving: Instead of using `spark_udf`, deploy your model to a Model Serving endpoint, which handles environment management differently.

Long-term Recommendation

Monitor Databricks release notes for updates to environment management in future DBR versions. The current state suggests that `env_manager="local"` is the most reliable option until Databricks provides better support for isolated environments without conda dependency.

This issue has been reported by multiple users and appears to be a gap in the current DBR 14.3 implementation. Using `env_manager="local"` is the recommended workaround that will allow you to proceed with your inference workload.

Hope this helps, Louis.