cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

!pip install vs. dbutils.library.installPyPI()

EricThomas
New Contributor

Hello,

Scenario:

Trying to install some python modules into a notebook (scoped to just the notebook) using...

```

dbutils.library.installPyPI("azure-identity")

dbutils.library.installPyPI("azure-storage-blob")

dbutils.library.restartPython()

```

...getting the (unclear) error...

```

org.apache.spark.SparkException: Process List(/local_disk0/pythonVirtualEnvDirs/virtualEnv-34b93f38-5a4f-41eb-a754-f16697cd339c/bin/python, /local_disk0/pythonVirtualEnvDirs/virtualEnv-34b93f38-5a4f-41eb-a754-f16697cd339c/bin/pip, install, azure-storage-blob==12.0.0, --disable-pip-version-check) exited with code 1. Traceback (most recent call last):

--------------------------------------------------------------------------- Py4JJavaError Traceback (most recent call last) <command-3781868905499817> in <module>() 1 dbutils.library.installPyPI("azure-identity") ----> 2dbutils.library.installPyPI("azure-storage-blob", version="12.0.0") 3 dbutils.library.restartPython()

/local_disk0/tmp/1587770610080-0/dbutils.py in installPyPI(self, project, version, repo, extras) 237 def installPyPI(self, project, version = "", repo = "", extras = ""): 238 return self.print_and_return(self.entry_point.getSharedDriverContext() \ --> 239 .addIsolatedPyPILibrary(project, version, repo, extras)) 240 241 def restartPython(self):

/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in call(self, *args) 1255 answer = self.gateway_client.send_command(command) 1256 return_value = get_return_value( -> 1257 answer, self.gateway_client, self.target_id, self.name) 1258 1259 for temp_arg in temp_args:

```

Whereas

!pip install -U azure-storage-blob

seems to work just fine. Questions:

1. Why is this?

2. At what scope does

!pip install

install python modules? - Notebook scope

- Library

- Cluster

Thank you!

2 REPLIES 2

eishbis
New Contributor II

Hi @ericOnline

I also faced the same issue and I eventually found that upgrading the databricks runtime version from my current "5.5 LTS (includes Apache Spark 2.4.3, Scala 2.11)" to "6.5(Scala 2.11,Spark 2.4.5) resolved this issue.

Though the official documentation says that dbutils.library.installPyPI is supported after runtime version 5.1 but that does not seem to be the case here.

Thanks

Ishan

eishbis
New Contributor II

Further, I found that dbutils.library.installPyPI is supported for LTS 5.5 DB version. In my case, I had some PyPI packages which I had installed at cluster level. I removed those cluster level PyPI packages and used dbutils.library.installPyPI to install notebook scoped packages. It works fine now.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!