cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

!pip install vs. dbutils.library.installPyPI()

EricThomas
New Contributor

Hello,

Scenario:

Trying to install some python modules into a notebook (scoped to just the notebook) using...

```

dbutils.library.installPyPI("azure-identity")

dbutils.library.installPyPI("azure-storage-blob")

dbutils.library.restartPython()

```

...getting the (unclear) error...

```

org.apache.spark.SparkException: Process List(/local_disk0/pythonVirtualEnvDirs/virtualEnv-34b93f38-5a4f-41eb-a754-f16697cd339c/bin/python, /local_disk0/pythonVirtualEnvDirs/virtualEnv-34b93f38-5a4f-41eb-a754-f16697cd339c/bin/pip, install, azure-storage-blob==12.0.0, --disable-pip-version-check) exited with code 1. Traceback (most recent call last):

--------------------------------------------------------------------------- Py4JJavaError Traceback (most recent call last) <command-3781868905499817> in <module>() 1 dbutils.library.installPyPI("azure-identity") ----> 2dbutils.library.installPyPI("azure-storage-blob", version="12.0.0") 3 dbutils.library.restartPython()

/local_disk0/tmp/1587770610080-0/dbutils.py in installPyPI(self, project, version, repo, extras) 237 def installPyPI(self, project, version = "", repo = "", extras = ""): 238 return self.print_and_return(self.entry_point.getSharedDriverContext() \ --> 239 .addIsolatedPyPILibrary(project, version, repo, extras)) 240 241 def restartPython(self):

/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in call(self, *args) 1255 answer = self.gateway_client.send_command(command) 1256 return_value = get_return_value( -> 1257 answer, self.gateway_client, self.target_id, self.name) 1258 1259 for temp_arg in temp_args:

```

Whereas

!pip install -U azure-storage-blob

seems to work just fine. Questions:

1. Why is this?

2. At what scope does

!pip install

install python modules? - Notebook scope

- Library

- Cluster

Thank you!

2 REPLIES 2

eishbis
New Contributor II

Hi @ericOnline

I also faced the same issue and I eventually found that upgrading the databricks runtime version from my current "5.5 LTS (includes Apache Spark 2.4.3, Scala 2.11)" to "6.5(Scala 2.11,Spark 2.4.5) resolved this issue.

Though the official documentation says that dbutils.library.installPyPI is supported after runtime version 5.1 but that does not seem to be the case here.

Thanks

Ishan

eishbis
New Contributor II

Further, I found that dbutils.library.installPyPI is supported for LTS 5.5 DB version. In my case, I had some PyPI packages which I had installed at cluster level. I removed those cluster level PyPI packages and used dbutils.library.installPyPI to install notebook scoped packages. It works fine now.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group