Databricks Community

tariq · ‎09-10-2025

I have added an extra-index-url in the default package repository in databricks which points to a repository in azure artifact. The libraries from it are getting installed on job cluster but is not working on the all purpose cluster. Below is the relevant config for the all purpose cluster:

{

"cluster_id": "some-id",

"data_security_mode": "USER_ISOLATION",

"effective_spark_version": "16.4.x-scala2.12",

"release_version": "16.4.8",

"autoscale": {

"min_workers": 1,

"max_workers": 15,

"target_workers": 1

},

"init_scripts_safe_mode": false,

"spec": {

"cluster_name": "cluster_name",

"spark_version": "16.4.x-scala2.12",

"autotermination_minutes": 15,

"instance_pool_id": "some-pool",

"driver_instance_pool_id": "some-pool",

"data_security_mode": "USER_ISOLATION",

"autoscale": {

"min_workers": 1,

"max_workers": 15

},

"apply_policy_default_values": false

}

Isi · ‎09-15-2025

Hola @tariq

My recomendation is create a init_script and attach it to the all-porpouse cluster.

#!/bin/bash

echo "[global]
index-url = https://<user>:<token>@pkgs.dev.azure.com/<org>/<project>/_packaging/<feed>/pypi/simple/
extra-index-url = https://pypi.org/simple
trusted-host = pkgs.dev.azure.com
" > /etc/pip.conf

You can upload it to your cloud storage and must add it under Catalog>Metastore>Allowed JARs/Init Scripts

Hope this helps 🙂

Isi

View solution in original post

Louis_Frolio · ‎09-17-2025

Greeting @tariq , this is a great question (thank you @Isi for raising the suggestion). I looked into our internal documentation, and it turns out that it is not recommended to install libraries cluster-wide on "All Purpose" compute in `USER_ISOLATION` mode. Databricks enforces strict separation between users—including how libraries are installed, loaded, and how environment variables are managed.

Key Points to Consider

- USER_ISOLATION clusters strictly restrict cross-user contamination. Init scripts and global environment variables are not always passed into the per-user, per-notebook Python context.

- Cluster-scoped installations (e.g., via init scripts with 'pip install', or through the cluster “Libraries” UI) often do not work as expected in notebook sessions under USER_ISOLATION.

- Instead, per-user '%pip' installs are isolated and recommended.

Recommended Approach

Install within Notebook Sessions

```python

%pip install --extra-index-url <azure-artifact-repo-url> package-name

```

Run this directly in your own notebook. This ensures the package is installed in your user session and correctly respects isolation and credentials.

If You Must Use Cluster Init Scripts

If pre-installation cluster-wide is absolutely necessary, you can test with an init script that explicitly uses the Python executable for notebook sessions. For example:

```bash

#!/bin/bash

/databricks/python/bin/pip install --extra-index-url <azure-artifact-repo-url> <package-name>

```

- Save this script in workspace storage or a mounted volume.

- Add it to the `init_scripts` section of the cluster configuration.

Afterwards, verify installation:

```python

import sys

print(sys.executable)

!pip list

```

Keep in mind: due to `USER_ISOLATION` boundaries, even init scripts may not guarantee availability across all user sessions. Installing with `%pip` inside each notebook is usually more reliable.

In Short

On all-purpose clusters with `USER_ISOLATION`, use `%pip install` with your extra index URL directly in your notebook, and ensure authentication is set for each user session. Init scripts are possible but less reliable for propagating libraries across users.

Hope this helps point you in the right direction!

Cheers, Louis.

View solution in original post

Isi · ‎09-15-2025

Hola @tariq

My recomendation is create a init_script and attach it to the all-porpouse cluster.

#!/bin/bash

echo "[global]
index-url = https://<user>:<token>@pkgs.dev.azure.com/<org>/<project>/_packaging/<feed>/pypi/simple/
extra-index-url = https://pypi.org/simple
trusted-host = pkgs.dev.azure.com
" > /etc/pip.conf

You can upload it to your cloud storage and must add it under Catalog>Metastore>Allowed JARs/Init Scripts

Hope this helps 🙂

Isi

Louis_Frolio · ‎09-17-2025

Greeting @tariq , this is a great question (thank you @Isi for raising the suggestion). I looked into our internal documentation, and it turns out that it is not recommended to install libraries cluster-wide on "All Purpose" compute in `USER_ISOLATION` mode. Databricks enforces strict separation between users—including how libraries are installed, loaded, and how environment variables are managed.

Key Points to Consider

- USER_ISOLATION clusters strictly restrict cross-user contamination. Init scripts and global environment variables are not always passed into the per-user, per-notebook Python context.

- Cluster-scoped installations (e.g., via init scripts with 'pip install', or through the cluster “Libraries” UI) often do not work as expected in notebook sessions under USER_ISOLATION.

- Instead, per-user '%pip' installs are isolated and recommended.

Recommended Approach

Install within Notebook Sessions

```python

%pip install --extra-index-url <azure-artifact-repo-url> package-name

```

Run this directly in your own notebook. This ensures the package is installed in your user session and correctly respects isolation and credentials.

If You Must Use Cluster Init Scripts

If pre-installation cluster-wide is absolutely necessary, you can test with an init script that explicitly uses the Python executable for notebook sessions. For example:

```bash

#!/bin/bash

/databricks/python/bin/pip install --extra-index-url <azure-artifact-repo-url> <package-name>

```

- Save this script in workspace storage or a mounted volume.

- Add it to the `init_scripts` section of the cluster configuration.

Afterwards, verify installation:

```python

import sys

print(sys.executable)

!pip list

```

Keep in mind: due to `USER_ISOLATION` boundaries, even init scripts may not guarantee availability across all user sessions. Installing with `%pip` inside each notebook is usually more reliable.

In Short

On all-purpose clusters with `USER_ISOLATION`, use `%pip install` with your extra index URL directly in your notebook, and ensure authentication is set for each user session. Init scripts are possible but less reliable for propagating libraries across users.

Hope this helps point you in the right direction!

Cheers, Louis.

Isi · ‎09-17-2025

Hello @Louis_Frolio

I understand Databricks’ best practices, but in my experience, for libraries that are already present in the cluster runtime (e.g pydantic), I haven’t been able to make %pip install consistently overwrite the preinstalled version. The only reliable way I’ve found is by installing them through the cluster Libraries UI—otherwise, when I run %pip install, the version I specify doesn’t seem to take effect because of how the environment isolation works.

That said, when working directly at the cluster level, I have been able to get it to work. I understand the point about USER_ISOLATION, but when you face restrictions (e.g., ML runtimes or other special environments), sometimes you have to rely on these “tricks” to get things working.

Still, I really appreciate your explanation. It would be great if Databricks could put more emphasis on making this kind of internal documentation available more openly, since it would save users a lot of trial and error.

Thanks,
Isi

Advika_ · ‎09-25-2025

Hello @tariq!

Did the suggestions shared above help address your issue? If so, please consider marking one of the responses as the accepted solution. If you found a different approach that worked for you, sharing it with the community would be really helpful.

Databricks Community

Databricks Default package repositories

Join Us as a Local Community Builder!

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

🌟 Community Pulse: Your Weekly Roundup! November 14 – 20, 2025

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples