cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Serving Endpoint Container Image Creation Fails

ukaplan
New Contributor III

Hello, I trained a model using MLFlow, and saved the model as an artifact. I can load the model from a notebook and it works as expected (i.e. I can load the model using its URI).

However, when I want to deploy it using Databricks endpoints, container image creation fails with the following build log:


#21 0.218 channels:
#21 0.218 - conda-forge
#21 0.218 dependencies:
#21 0.218 - python=3.11.0
#21 0.218 - pip<=23.2.1
#21 0.218 - pip:
#21 0.218 - mlflow==2.11.3
#21 0.218 - accelerate==0.30.1
#21 0.218 - astunparse==1.6.3
#21 0.218 - bcrypt==3.2.0
#21 0.218 - boto3==1.34.39
#21 0.218 - configparser==5.2.0
#21 0.218 - defusedxml==0.7.1
#21 0.218 - dill==0.3.6
#21 0.218 - google-cloud-storage==2.10.0
#21 0.218 - ipython==8.15.0
#21 0.218 - lz4==4.3.2
#21 0.218 - optree==0.11.0
#21 0.218 - pydantic==1.10.6
#21 0.218 - pynvml==11.5.0
#21 0.218 - pyopenssl==23.2.0
#21 0.218 - python-snappy==0.6.1
#21 0.218 - sentence-transformers==2.7.0
#21 0.218 - sentencepiece==0.1.99
#21 0.218 - torch==2.3.0
#21 0.218 - transformers==4.40.2
#21 0.218 name: mlflow-env

...

And after a series of successful installations:

#21 161.6 Building wheels for collected packages: python-snappy
#21 161.6 Building wheel for python-snappy (setup.py): started
#21 161.6 Building wheel for python-snappy (setup.py): finished with status 'error'
#21 161.6 Running setup.py clean for python-snappy
#21 161.6 Failed to build python-snappy
#21 161.6 Pip subprocess error:
#21 161.6 error: subprocess-exited-with-error
#21 161.6
#21 161.6 × python setup.py bdist_wheel did not run successfully.
#21 161.6 │ exit code: 1
#21 161.6 ╰─> [27 lines of output]
#21 161.6 /opt/conda/envs/mlflow-env/lib/python3.11/site-packages/setuptools/_distutils/dist.py:268: UserWarning: Unknown distribution option: 'cffi_modules'
#21 161.6 warnings.warn(msg)
#21 161.6 running bdist_wheel
#21 161.6 running build
#21 161.6 running build_py
#21 161.6 creating build
#21 161.6 creating build/lib.linux-x86_64-cpython-311
#21 161.6 creating build/lib.linux-x86_64-cpython-311/snappy
#21 161.6 copying src/snappy/__init__.py -> build/lib.linux-x86_64-cpython-311/snappy
#21 161.6 copying src/snappy/snappy_cffi_builder.py -> build/lib.linux-x86_64-cpython-311/snappy
#21 161.6 copying src/snappy/__main__.py -> build/lib.linux-x86_64-cpython-311/snappy
#21 161.6 copying src/snappy/snappy.py -> build/lib.linux-x86_64-cpython-311/snappy
#21 161.6 copying src/snappy/hadoop_snappy.py -> build/lib.linux-x86_64-cpython-311/snappy
#21 161.6 copying src/snappy/snappy_cffi.py -> build/lib.linux-x86_64-cpython-311/snappy
#21 161.6 copying src/snappy/snappy_formats.py -> build/lib.linux-x86_64-cpython-311/snappy
#21 161.6 running build_ext
#21 161.6 building 'snappy._snappy' extension
#21 161.6 creating build/temp.linux-x86_64-cpython-311
#21 161.6 creating build/temp.linux-x86_64-cpython-311/src
#21 161.6 creating build/temp.linux-x86_64-cpython-311/src/snappy
#21 161.6 gcc -pthread -B /opt/conda/envs/mlflow-env/compiler_compat -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/envs/mlflow-env/include -fPIC -O2 -isystem /opt/conda/envs/mlflow-env/include -fPIC -I/opt/conda/envs/mlflow-env/include/python3.11 -c src/snappy/crc32c.c -o build/temp.linux-x86_64-cpython-311/src/snappy/crc32c.o
#21 161.6 gcc -pthread -B /opt/conda/envs/mlflow-env/compiler_compat -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/envs/mlflow-env/include -fPIC -O2 -isystem /opt/conda/envs/mlflow-env/include -fPIC -I/opt/conda/envs/mlflow-env/include/python3.11 -c src/snappy/snappymodule.cc -o build/temp.linux-x86_64-cpython-311/src/snappy/snappymodule.o
#21 161.6 src/snappy/snappymodule.cc:33:10: fatal error: snappy-c.h: No such file or directory
#21 161.6 33 | #include <snappy-c.h>
#21 161.6 | ^~~~~~~~~~~~
#21 161.6 compilation terminated.
#21 161.6 error: command '/usr/bin/gcc' failed with exit code 1
#21 161.6 [end of output]
#21 161.6
#21 161.6 note: This error originates from a subprocess, and is likely not a problem with pip.
#21 161.6 ERROR: Failed building wheel for python-snappy
#21 161.6 ERROR: Could not build wheels for python-snappy, which is required to install pyproject.toml-based projects
#21 161.6
#21 161.6
#21 161.6 failed
#21 161.6
#21 161.6 CondaEnvException: Pip failed
#21 161.6
#21 167.5 Retrying in 10 seconds... (Attempt 2/5)
#21 177.9
#21 177.9 CondaValueError: prefix already exists: /opt/conda/envs/mlflow-env
#21 177.9
#21 177.9 Retrying in 10 seconds... (Attempt 3/5)
#21 188.2
#21 188.2 CondaValueError: prefix already exists: /opt/conda/envs/mlflow-env
#21 188.2
#21 188.2 Retrying in 10 seconds... (Attempt 4/5)
#21 198.6
#21 198.6 CondaValueError: prefix already exists: /opt/conda/envs/mlflow-env
#21 198.6
#21 198.6 Retrying in 10 seconds... (Attempt 5/5)
#21 208.9
#21 208.9 CondaValueError: prefix already exists: /opt/conda/envs/mlflow-env
#21 208.9
#21 209.0 Failed to create conda environment after 5 attempts.
#21 ERROR: process "/bin/sh -c echo $BUILD_LOG_START_DELIMITER && cat model/conda.yaml && bash conda-env-create.sh --max-retries 5 && echo $BUILD_LOG_CONDA_END_DELIMITER && echo $BUILD_LOG_END_DELIMITER && conda clean -afy" did not complete successfully: exit code: 1
------

11 REPLIES 11

ukaplan
New Contributor III

Helloe @Retired_mod, I could not find the option to send a private message. Can you send me a message so I can reply there?

Edit: I have found it, sending a message now

ukaplan
New Contributor III

Hello again @Retired_mod ,

Unfortunately first time I tried to send the message, platform complained about having HTML in the message body, even though there was not, and then when I refreshed the page and tried sending a message, I was told my private message limit was reached. Can you send me a message so I can reply?

Best,

ukaplan
New Contributor III

Hello @Retired_mod , I use AWS and DBR 15.3

ukaplan
New Contributor III

Hi @Retired_mod , thank you. Do you have an estimated timeline for troubleshooting?

eystein
New Contributor II

Hi @Retired_mod, same problem on Azure

ukaplan
New Contributor III

Right now, I am pretty sure issue is with the automatic determination of the required packages in MLFlow. It thinks it needs "python-snappy" for some reason, but that is not the case. I edited the requirements files, but I guess when you register the model you also register the requirements so it is too late. So anyhow, I deployed the model on another platform that let me specify the packages by hand. I don't think databricks provide the same level of control, is that right @Retired_mod?

raju_11
New Contributor III

Hi @ukaplan could you please tell me how to resolve this error . For me also same issue . 

ukaplan
New Contributor III

Hello @raju_11 , depends on how you want to proceed. I do not know how to deploy it in Databricks. But if you are open to using another provider, this is how I did it: I use a serverless GPU provider, and actually connect to my MLFlow server on databricks to pull the registered model. Then, because this different provider will also ask you to prepare your requirements file, you can just copy-paste the registered requirements.txt on your mlflow server except the python-snappy module. This should fix the problem.

raju_11
New Contributor III
@ukaplan @Retired_mod 
ERROR ::
Unsupported access mechanism for MLflow artifacts
Artifacts stored in 'dbfs:/databricks/mlflow-tracking' can only be accessed using version 1.9.1 or later of the MLflow client

I want to delete the existing artifacts in the
 dbfs:/databricks/mlflow-tracking/...../artifacts/model
 path. But not able to intereact with these artifacts.  Not able to access, editable  and cant delete. Could please tell me how to achieve these problems. 

ivan_calvo
Databricks Employee
Databricks Employee

@Retired_mod @ukaplan @raju_11 @eystein 

Solution: Downgrade your ML cluster from DBR 15.X LTS ML to DBR 14.3 LTS ML. Then, register the model again.

My scenario:

I was registering a Hugging Face T5 model using a DBR 15.4 LTS version and ran into the same errors:

  • src/snappy/snappymodule.cc:33:10: fatal error: snappy-c.h: No such file or directory
  • error: command '/usr/bin/g++' failed with exit code 1
  • ERROR: Failed building wheel for python-snappy

I retried the model registration using a DBR 14.3 LTS ML cluster and then served the model, and it worked.

I hope this helps!

damselfly20
New Contributor III

@ivan_calvo The problem still exists. Surely there has to be some other option than downgrading the ML cluster to DBR 14.3 LTS ML?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group