Databricks Community

AlexMc · ‎07-14-2025

Hi!

I get the below error when a cluster job starts up and tries to install a Python .whl file. (Which is hosted on an Azure Artefact feed, though this seems more like a problem of trying to read from a disk/network storage). The failure is seemingly random and intermittent, from the error message it is clearly a networking/timeout problem.

I see in the log below it mentions Retry(total=4 ... Is it possible to increase/modify this? Or perhaps adds some exponential backoff?

Thanks!
Alex

Library installation attempted on the driver node of cluster xxxxxxxx and failed. Please refer to the following error message or contact Databricks support. Error code: FAULT_OTHER, error message: org.apache.spark.SparkException: Process List(/bin/su, libraries, -c, bash /local_disk0/.ephemeral_nfs/cluster_libraries/python/python_start_clusterwide.sh /local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/pip install 'my.company.library==1.0.0' --disable-pip-version-check) exited with code 1. WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f872e82bf40>, 'Connectio ...

szymon_dybczak · ‎07-14-2025

Hi @AlexMc ,

You can try to increase timeout and retries number using pip command line options:

pip install 'your_library' \
  --timeout 300 \
  --retries 10 \
  --disable-pip-version-check

Pat · ‎07-14-2025

I would check the network connection between cluster and repository.
The error shows pip is currently in the retry phase due to a ConnectTimeoutError, indicating network connectivity issues when trying to reach the package repository.

AlexMc · ‎07-14-2025

Thanks both - I think the problem is that this library installation is called when creating a new Job & Task via the rest endpoint. Where the libraires are specified in the .json file.

So short version, don't think I can 'get at' the pip install call in order to add extra parameters. Instead sounds like I might have to remove the libraries from the .json and install at the notebook level with a %pip command. (Where I have more control of the retry logic)

Databricks Community

Library installation failed for library due to user error for pypi

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! November 28 – December 04, 2025

Lakehouse, Lagers & Legends — Bangalore Meetup | December 13

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples