cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks Default package repositories

tariq
New Contributor III

I have added an extra-index-url in the default package repository in databricks which points to a repository in azure artifact. The libraries from it are getting installed on job cluster but is not working on the all purpose cluster. Below is the relevant config for the all purpose cluster:

{
    "cluster_id": "some-id",

    "data_security_mode": "USER_ISOLATION",
    "effective_spark_version": "16.4.x-scala2.12",
    "release_version": "16.4.8",
    "autoscale": {
        "min_workers": 1,
        "max_workers": 15,
        "target_workers": 1
    },
    "init_scripts_safe_mode": false,
    "spec": {
        "cluster_name": "cluster_name",
        "spark_version": "16.4.x-scala2.12",
        "autotermination_minutes": 15,
        "instance_pool_id": "some-pool",
        "driver_instance_pool_id": "some-pool",
        "data_security_mode": "USER_ISOLATION",
        "autoscale": {
            "min_workers": 1,
            "max_workers": 15
        },
        "apply_policy_default_values": false
    }
}
3 REPLIES 3

Isi
Honored Contributor II

Hola @tariq 

My recomendation is create a init_script and attach it to the all-porpouse cluster.

#!/bin/bash

echo "[global]
index-url = https://<user>:<token>@pkgs.dev.azure.com/<org>/<project>/_packaging/<feed>/pypi/simple/
extra-index-url = https://pypi.org/simple
trusted-host = pkgs.dev.azure.com
" > /etc/pip.conf

You can upload it to your cloud storage and must add it under Catalog>Metastore>Allowed JARs/Init Scripts


Hope this helps 🙂

Isi

BigRoux
Databricks Employee
Databricks Employee

Greeting @tariq , this is a great question (thank you @Isi for raising the suggestion). I looked into our internal documentation, and it turns out that it is not recommended to install libraries cluster-wide on "All Purpose" compute in `USER_ISOLATION` mode. Databricks enforces strict separation between users—including how libraries are installed, loaded, and how environment variables are managed. 

Key Points to Consider

- USER_ISOLATION clusters strictly restrict cross-user contamination. Init scripts and global environment variables are not always passed into the per-user, per-notebook Python context. 

- Cluster-scoped installations (e.g., via init scripts with 'pip install', or through the cluster “Libraries” UI) often do not work as expected in notebook sessions under USER_ISOLATION. 

- Instead, per-user '%pip' installs are isolated and recommended. 

Recommended Approach

Install within Notebook Sessions

```python

%pip install --extra-index-url <azure-artifact-repo-url> package-name

```

Run this directly in your own notebook. This ensures the package is installed in your user session and correctly respects isolation and credentials. 

If You Must Use Cluster Init Scripts

If pre-installation cluster-wide is absolutely necessary, you can test with an init script that explicitly uses the Python executable for notebook sessions. For example: 

```bash

#!/bin/bash

/databricks/python/bin/pip install --extra-index-url <azure-artifact-repo-url> <package-name>

```

- Save this script in workspace storage or a mounted volume. 

- Add it to the `init_scripts` section of the cluster configuration. 

Afterwards, verify installation: 

```python

import sys

print(sys.executable)

!pip list

```

Keep in mind: due to `USER_ISOLATION` boundaries, even init scripts may not guarantee availability across all user sessions. Installing with `%pip` inside each notebook is usually more reliable. 

In Short

On all-purpose clusters with `USER_ISOLATION`, use `%pip install` with your extra index URL directly in your notebook, and ensure authentication is set for each user session. Init scripts are possible but less reliable for propagating libraries across users. 

Hope this helps point you in the right direction! 

Cheers, Louis.

 

Isi
Honored Contributor II

Hello @BigRoux 

I understand Databricks’ best practices, but in my experience, for libraries that are already present in the cluster runtime (e.g pydantic), I haven’t been able to make %pip install consistently overwrite the preinstalled version. The only reliable way I’ve found is by installing them through the cluster Libraries UI—otherwise, when I run %pip install, the version I specify doesn’t seem to take effect because of how the environment isolation works.

 

That said, when working directly at the cluster level, I have been able to get it to work. I understand the point about USER_ISOLATION, but when you face restrictions (e.g., ML runtimes or other special environments), sometimes you have to rely on these “tricks” to get things working.

 

Still, I really appreciate your explanation. It would be great if Databricks could put more emphasis on making this kind of internal documentation available more openly, since it would save users a lot of trial and error.

Thanks,
Isi

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now