cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Job sometimes failing due to library installation error of Pypi library

ADuma
New Contributor III

I am running a job on a Cluster from a compute pool that is installing a package from our Azure Artifacts Feed. My task is supposed to run a wheel task from our library which has about a dozen dependencies.

For more than 95% of the runs this job works fine, but every now and then it fails with a library installation error.

run failed with error message
 Library installation failed for library due to user error for pypi {
  package: "our-library"
  repo: "our-feed"
}
 Error messages:
Library installation attempted on the driver node of cluster id and failed. Please refer to the following error message or contact Databricks support. Error code: FAULT_OTHER, error message: org.apache.spark.SparkException: 
java.lang.Throwable: Process List(/bin/su, libraries, -c, bash /local_disk0/.ephemeral_nfs/cluster_libraries/python/python_start_clusterwide.sh /local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/pip install 'our-library' --index-url our-feed --disable-pip-version-check) exited with code 1. ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them.
    azure-mgmt-storage==21.1.0 from our-feed/pypi/download/azure-mgmt-storage/21.1/azure_mgmt_storage-21.1.0-py3-none-any.whl#sha256=593f2544fc4f05750c4fe7ca4d83c32ea1e9d266e57899bbf79ce5940124e8cc (from our-library):
        Expected sha256 593f2544fc4f05750c4fe7ca4d83c32ea1e9d266e57899bbf79ce5940124e8cc
             Got        d7782b389ae84fb0d2d4710e3ff4172af06308598b02dfe176165322d4117163

This error occurs for packages that are dependencies of our library, for example with the pandas or azure-mgmt-storage library. I have not been able to figure out why these installation problems occur. The error also occured for different jobs that were using different entry points of the same package.

One suspicion that I have is that the cluster is somehow being reused (as it comes for a compute pool) from a previous run of a different job and a cached version of the library is being used. But even then all our jobs use the same version of the pandas library and there should not be a difference in the hash.

 

I wanted to try using the -no-cache-dir option for pip but was not able to set that yet. I hope that by ignoring cached packages this error can be avoided. I tried setting an environment variable on the job cluster

PIP_NO_CACHE_DIR=true

but the option was still ignored when installing libraries. Does somebody know how to add options to pip on how to install cluster libraries from a Azure feed?

 

I would appreciate any help, as I'm quite stuck on how to keeps this error from appearing.

2 REPLIES 2

Brahmareddy
Honored Contributor III

Hi ADuma,

How are you doing today?, As per my understanding, You're on the right track, and this kind of intermittent library install issue is pretty common when using clusters from a compute pool. What’s likely happening is that the cluster is reusing a driver node from a previous job, and some cached package files are conflicting with what pip expects, leading to that hash mismatch error. Setting PIP_NO_CACHE_DIR=true alone won’t work here, because Databricks doesn't pass that environment variable to pip automatically during job-based library installs. A better workaround is to use an init script that runs on the cluster startup and installs your package using pip with the --no-cache-dir flag. This way, you can control the install process and avoid pip using any cached versions that could cause issues. Just add a simple shell script to DBFS that runs pip install --no-cache-dir --index-url <your-feed> your-library, and attach it to your job cluster. It should help prevent these random failures. Let me know if you want help setting up the init script!

Regards,

Brahma

ADuma
New Contributor III

Hi Brahma,

thanks a lot for the help. I'm trying installing my libraries with an init script right now. Unfortunately the error does not occur very regularily, so I'll have to observer for a few days 😄

I'm not 100% happy with the solution though. We are currently setting the library version via cluster specification in the databricks-sdk, which is stored in a repo. It would be extra effort to adjust the installed library version in the init script, which can not be stored in the repo if I saw it correctly. Also seeing that Databricks advises against using init scripts for library installation I'm not fully convinced.

But if it works, it might still be the way to go. Thanks for your help!

Andreas

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now