Job sometimes failing due to library installation error of Pypi library
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Wednesday
I am running a job on a Cluster from a compute pool that is installing a package from our Azure Artifacts Feed. My task is supposed to run a wheel task from our library which has about a dozen dependencies.
For more than 95% of the runs this job works fine, but every now and then it fails with a library installation error.
run failed with error message
Library installation failed for library due to user error for pypi {
package: "our-library"
repo: "our-feed"
}
Error messages:
Library installation attempted on the driver node of cluster id and failed. Please refer to the following error message or contact Databricks support. Error code: FAULT_OTHER, error message: org.apache.spark.SparkException:
java.lang.Throwable: Process List(/bin/su, libraries, -c, bash /local_disk0/.ephemeral_nfs/cluster_libraries/python/python_start_clusterwide.sh /local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/pip install 'our-library' --index-url our-feed --disable-pip-version-check) exited with code 1. ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them.
azure-mgmt-storage==21.1.0 from our-feed/pypi/download/azure-mgmt-storage/21.1/azure_mgmt_storage-21.1.0-py3-none-any.whl#sha256=593f2544fc4f05750c4fe7ca4d83c32ea1e9d266e57899bbf79ce5940124e8cc (from our-library):
Expected sha256 593f2544fc4f05750c4fe7ca4d83c32ea1e9d266e57899bbf79ce5940124e8cc
Got d7782b389ae84fb0d2d4710e3ff4172af06308598b02dfe176165322d4117163
This error occurs for packages that are dependencies of our library, for example with the pandas or azure-mgmt-storage library. I have not been able to figure out why these installation problems occur. The error also occured for different jobs that were using different entry points of the same package.
One suspicion that I have is that the cluster is somehow being reused (as it comes for a compute pool) from a previous run of a different job and a cached version of the library is being used. But even then all our jobs use the same version of the pandas library and there should not be a difference in the hash.
I wanted to try using the -no-cache-dir option for pip but was not able to set that yet. I hope that by ignoring cached packages this error can be avoided. I tried setting an environment variable on the job cluster
PIP_NO_CACHE_DIR=true
but the option was still ignored when installing libraries. Does somebody know how to add options to pip on how to install cluster libraries from a Azure feed?
I would appreciate any help, as I'm quite stuck on how to keeps this error from appearing.

