08-19-2024 07:20 AM
Hello, I trained a model using MLFlow, and saved the model as an artifact. I can load the model from a notebook and it works as expected (i.e. I can load the model using its URI).
However, when I want to deploy it using Databricks endpoints, container image creation fails with the following build log:
#21 0.218 channels:
#21 0.218 - conda-forge
#21 0.218 dependencies:
#21 0.218 - python=3.11.0
#21 0.218 - pip<=23.2.1
#21 0.218 - pip:
#21 0.218 - mlflow==2.11.3
#21 0.218 - accelerate==0.30.1
#21 0.218 - astunparse==1.6.3
#21 0.218 - bcrypt==3.2.0
#21 0.218 - boto3==1.34.39
#21 0.218 - configparser==5.2.0
#21 0.218 - defusedxml==0.7.1
#21 0.218 - dill==0.3.6
#21 0.218 - google-cloud-storage==2.10.0
#21 0.218 - ipython==8.15.0
#21 0.218 - lz4==4.3.2
#21 0.218 - optree==0.11.0
#21 0.218 - pydantic==1.10.6
#21 0.218 - pynvml==11.5.0
#21 0.218 - pyopenssl==23.2.0
#21 0.218 - python-snappy==0.6.1
#21 0.218 - sentence-transformers==2.7.0
#21 0.218 - sentencepiece==0.1.99
#21 0.218 - torch==2.3.0
#21 0.218 - transformers==4.40.2
#21 0.218 name: mlflow-env
...
And after a series of successful installations:
#21 161.6 Building wheels for collected packages: python-snappy
#21 161.6 Building wheel for python-snappy (setup.py): started
#21 161.6 Building wheel for python-snappy (setup.py): finished with status 'error'
#21 161.6 Running setup.py clean for python-snappy
#21 161.6 Failed to build python-snappy
#21 161.6 Pip subprocess error:
#21 161.6 error: subprocess-exited-with-error
#21 161.6
#21 161.6 × python setup.py bdist_wheel did not run successfully.
#21 161.6 │ exit code: 1
#21 161.6 ╰─> [27 lines of output]
#21 161.6 /opt/conda/envs/mlflow-env/lib/python3.11/site-packages/setuptools/_distutils/dist.py:268: UserWarning: Unknown distribution option: 'cffi_modules'
#21 161.6 warnings.warn(msg)
#21 161.6 running bdist_wheel
#21 161.6 running build
#21 161.6 running build_py
#21 161.6 creating build
#21 161.6 creating build/lib.linux-x86_64-cpython-311
#21 161.6 creating build/lib.linux-x86_64-cpython-311/snappy
#21 161.6 copying src/snappy/__init__.py -> build/lib.linux-x86_64-cpython-311/snappy
#21 161.6 copying src/snappy/snappy_cffi_builder.py -> build/lib.linux-x86_64-cpython-311/snappy
#21 161.6 copying src/snappy/__main__.py -> build/lib.linux-x86_64-cpython-311/snappy
#21 161.6 copying src/snappy/snappy.py -> build/lib.linux-x86_64-cpython-311/snappy
#21 161.6 copying src/snappy/hadoop_snappy.py -> build/lib.linux-x86_64-cpython-311/snappy
#21 161.6 copying src/snappy/snappy_cffi.py -> build/lib.linux-x86_64-cpython-311/snappy
#21 161.6 copying src/snappy/snappy_formats.py -> build/lib.linux-x86_64-cpython-311/snappy
#21 161.6 running build_ext
#21 161.6 building 'snappy._snappy' extension
#21 161.6 creating build/temp.linux-x86_64-cpython-311
#21 161.6 creating build/temp.linux-x86_64-cpython-311/src
#21 161.6 creating build/temp.linux-x86_64-cpython-311/src/snappy
#21 161.6 gcc -pthread -B /opt/conda/envs/mlflow-env/compiler_compat -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/envs/mlflow-env/include -fPIC -O2 -isystem /opt/conda/envs/mlflow-env/include -fPIC -I/opt/conda/envs/mlflow-env/include/python3.11 -c src/snappy/crc32c.c -o build/temp.linux-x86_64-cpython-311/src/snappy/crc32c.o
#21 161.6 gcc -pthread -B /opt/conda/envs/mlflow-env/compiler_compat -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/envs/mlflow-env/include -fPIC -O2 -isystem /opt/conda/envs/mlflow-env/include -fPIC -I/opt/conda/envs/mlflow-env/include/python3.11 -c src/snappy/snappymodule.cc -o build/temp.linux-x86_64-cpython-311/src/snappy/snappymodule.o
#21 161.6 src/snappy/snappymodule.cc:33:10: fatal error: snappy-c.h: No such file or directory
#21 161.6 33 | #include <snappy-c.h>
#21 161.6 | ^~~~~~~~~~~~
#21 161.6 compilation terminated.
#21 161.6 error: command '/usr/bin/gcc' failed with exit code 1
#21 161.6 [end of output]
#21 161.6
#21 161.6 note: This error originates from a subprocess, and is likely not a problem with pip.
#21 161.6 ERROR: Failed building wheel for python-snappy
#21 161.6 ERROR: Could not build wheels for python-snappy, which is required to install pyproject.toml-based projects
#21 161.6
#21 161.6
#21 161.6 failed
#21 161.6
#21 161.6 CondaEnvException: Pip failed
#21 161.6
#21 167.5 Retrying in 10 seconds... (Attempt 2/5)
#21 177.9
#21 177.9 CondaValueError: prefix already exists: /opt/conda/envs/mlflow-env
#21 177.9
#21 177.9 Retrying in 10 seconds... (Attempt 3/5)
#21 188.2
#21 188.2 CondaValueError: prefix already exists: /opt/conda/envs/mlflow-env
#21 188.2
#21 188.2 Retrying in 10 seconds... (Attempt 4/5)
#21 198.6
#21 198.6 CondaValueError: prefix already exists: /opt/conda/envs/mlflow-env
#21 198.6
#21 198.6 Retrying in 10 seconds... (Attempt 5/5)
#21 208.9
#21 208.9 CondaValueError: prefix already exists: /opt/conda/envs/mlflow-env
#21 208.9
#21 209.0 Failed to create conda environment after 5 attempts.
#21 ERROR: process "/bin/sh -c echo $BUILD_LOG_START_DELIMITER && cat model/conda.yaml && bash conda-env-create.sh --max-retries 5 && echo $BUILD_LOG_CONDA_END_DELIMITER && echo $BUILD_LOG_END_DELIMITER && conda clean -afy" did not complete successfully: exit code: 1
------
08-20-2024 08:16 AM - edited 08-20-2024 08:18 AM
Helloe @Retired_mod, I could not find the option to send a private message. Can you send me a message so I can reply there?
Edit: I have found it, sending a message now
08-20-2024 08:23 AM
Hello again @Retired_mod ,
Unfortunately first time I tried to send the message, platform complained about having HTML in the message body, even though there was not, and then when I refreshed the page and tried sending a message, I was told my private message limit was reached. Can you send me a message so I can reply?
Best,
08-21-2024 01:10 AM
Hello @Retired_mod , I use AWS and DBR 15.3
08-22-2024 04:08 AM
Hi @Retired_mod , thank you. Do you have an estimated timeline for troubleshooting?
08-27-2024 10:59 PM
Hi @Retired_mod, same problem on Azure
08-28-2024 12:20 AM
Right now, I am pretty sure issue is with the automatic determination of the required packages in MLFlow. It thinks it needs "python-snappy" for some reason, but that is not the case. I edited the requirements files, but I guess when you register the model you also register the requirements so it is too late. So anyhow, I deployed the model on another platform that let me specify the packages by hand. I don't think databricks provide the same level of control, is that right @Retired_mod?
08-29-2024 05:38 AM
Hi @ukaplan could you please tell me how to resolve this error . For me also same issue .
08-29-2024 06:13 AM
Hello @raju_11 , depends on how you want to proceed. I do not know how to deploy it in Databricks. But if you are open to using another provider, this is how I did it: I use a serverless GPU provider, and actually connect to my MLFlow server on databricks to pull the registered model. Then, because this different provider will also ask you to prepare your requirements file, you can just copy-paste the registered requirements.txt on your mlflow server except the python-snappy module. This should fix the problem.
09-03-2024 06:43 AM
10-03-2024 09:55 AM - edited 10-03-2024 09:55 AM
@Retired_mod @ukaplan @raju_11 @eystein
Solution: Downgrade your ML cluster from DBR 15.X LTS ML to DBR 14.3 LTS ML. Then, register the model again.
My scenario:
I was registering a Hugging Face T5 model using a DBR 15.4 LTS version and ran into the same errors:
I retried the model registration using a DBR 14.3 LTS ML cluster and then served the model, and it worked.
I hope this helps!
yesterday
@ivan_calvo The problem still exists. Surely there has to be some other option than downgrading the ML cluster to DBR 14.3 LTS ML?
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group