cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Library installation failed for library due to user error for pypi

AlexMc
New Contributor III

Hi!

I get the below error when a cluster job starts up and tries to install a Python .whl file. (Which is hosted on an Azure Artefact feed, though this seems more like a problem of trying to read from a disk/network storage). The failure is seemingly random and intermittent, from the error message it is clearly a networking/timeout problem.

I see in the log below it mentions Retry(total=4 ... Is it possible to increase/modify this? Or perhaps adds some exponential backoff?

Thanks!
Alex

Library installation attempted on the driver node of cluster xxxxxxxx and failed. Please refer to the following error message or contact Databricks support. Error code: FAULT_OTHER, error message: org.apache.spark.SparkException: Process List(/bin/su, libraries, -c, bash /local_disk0/.ephemeral_nfs/cluster_libraries/python/python_start_clusterwide.sh /local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/pip install 'my.company.library==1.0.0' --disable-pip-version-check) exited with code 1.   WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f872e82bf40>, 'Connectio ...

3 REPLIES 3

szymon_dybczak
Esteemed Contributor III

Hi @AlexMc ,

You can try to increase timeout and retries number using pip command line options: 

pip install 'your_library' \
  --timeout 300 \
  --retries 10 \
  --disable-pip-version-check

 

Pat
Esteemed Contributor

I would check the network connection between cluster and repository. 
The error shows pip is currently in the retry phase due to a ConnectTimeoutError, indicating network connectivity issues when trying to reach the package repository.

AlexMc
New Contributor III

Thanks both - I think the problem is that this library installation is called when creating a new Job & Task via the rest endpoint. Where the libraires are specified in the .json file. 

So short version, don't think I can 'get at' the pip install call in order to add extra parameters. Instead sounds like I might have to remove the libraries from the .json and install at the notebook level with a %pip command. (Where I have more control of the retry logic)