cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

I'm no longer able to import MLFlow using PYPI to automated clusters

_CV
New Contributor III

Starting yesterday afternoon, my job clusters across different workstations started throwing an error when importing from pypi the MLFlow library upon cluster initiation and startup.

I'm using an Azure Databricks automated job cluster (details below) and installing MLFlow (mlflow==1.26.1) as one of several dependent libraries via pypi. I additionally tried changing the MLFlow version, which did not change the result, and tried not specifying a version at all, which also did not work.

These jobs and clusters were working the previous day. Any troubleshoot suggestions is much appreciated.

Cluster details:

Driver: Standard_DS5_v2

Workers: Standard_DS5_v2 ยท 8 workers ยท

7.3 LTS (includes Apache Spark 3.0.1, Scala 2.12)

Error message:

Run result unavailable: job failed with error message
 Library installation failed for library due to user error for pypi {
  package: "mlflow==1.26.1"
}
. Error messages:
Library installation attempted on the driver node of cluster 0208-140630-58jkle3z and failed. Please refer to the following error message to fix the library or contact Databricks support. Error Code: DRIVER_LIBRARY_INSTALLATION_FAILURE. Error Message: org.apache.spark.SparkException: Process List(/databricks/python/bin/pip, install, mlflow==1.26.1, --disable-pip-version-check) exited with code 1.   ERROR: Command errored out with exit status 1:
   command: /databricks/python3/bin/python3.7 /databricks/python3/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmpiooih4q6
       cwd: /tmp/pip-install-ffemy0b0/alembic
  Complete output (16 lines):
  Traceback (most recent call last):
    File "/databricks/python3/lib/python3.7/sit ...
***WARNING: message truncated. Skipped 1338 bytes of output**

1 ACCEPTED SOLUTION

Accepted Solutions

_CV
New Contributor III

Hi Debayan - thanks for the response. It was working before, then it was down for ~24 hrs, and is now working again (nothing changed). I'm still not sure what happened.

In lower environments, we worked around the issue by pip installing the package within individual notebooks, but our production clusters were throwing errors when trying to install the library. I should consider moving to ML clusters where MLFlow is preinstalled.

View solution in original post

3 REPLIES 3

Debayan
Esteemed Contributor III
Esteemed Contributor III

Hi, What is the DBR used , based on it have you already checked the dependency packages and the compatibility to the DBR? Was it working before?

_CV
New Contributor III

Hi Debayan - thanks for the response. It was working before, then it was down for ~24 hrs, and is now working again (nothing changed). I'm still not sure what happened.

In lower environments, we worked around the issue by pip installing the package within individual notebooks, but our production clusters were throwing errors when trying to install the library. I should consider moving to ML clusters where MLFlow is preinstalled.

Anonymous
Not applicable

Hi @Chris Valleyโ€‹ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!