04-04-2022 03:57 AM
Hi
I would like to use the azure artifact feed as my default index-url when doing a pip install on a Databricks cluster. I understand I can achieve this by updating the pip.conf file with my artifact feed as the index-url. Does anyone know where in the filesystem I would update that, or has a global init script that achieves that?
My second question, I would like to have it so users can pip install from a notebook, while having that exact same notebook work when run in a job by a service principal, so can't authenticate as a user.
I believe that using the artifacts-keyring I should be able to add an environment variable on the jobs cluster that forces it to use a PAT to authenticate to the artifact feed. Has anyone tried this/has an example of doing similar?
Many Thanks
Mat
05-12-2022 10:51 PM
for your first question https://docs.databricks.com/libraries/index.html#python-environment-management and https://docs.databricks.com/libraries/notebooks-python-libraries.html#manage-libraries-with-pip-comm... this may help. again you can convert this to init script .
05-12-2022 10:51 PM
for your first question https://docs.databricks.com/libraries/index.html#python-environment-management and https://docs.databricks.com/libraries/notebooks-python-libraries.html#manage-libraries-with-pip-comm... this may help. again you can convert this to init script .
06-07-2022 11:12 AM
Hi @Mathew Walters,
Just a friendly follow-up. Do you still need help or Atanu's response helped you to resolved your issue?
09-26-2024 11:40 AM
Reviving this thread as I'm having the same issue. While embedding the PAT would work to install a package directly into my environment, my requirement is to install packages that have libraries hosted in artifact feeds as dependencies. Since the dependency spec is external to databricks in that situation I don't think I can retrieve a PAT from a secret scope to accomplish this. Outside of databricks as described above I'd authenticate to DevOps with somethine like az login and then the credential would be accessible through artifacts keyring. I'm not sure how to reproduce this in databricks though
a month ago - last edited a month ago
Hello @ipreston,
In my team right now we are using Databrick Asset Bundle (DAB) and for solving a similar issue which is: retrieving private libraries hosted in Azure DevOps Artifact Feed, we added:
To the databricks.yml:
targets:
dev:
mode: development
default: true
run_as:
user_name: oishiiramen@chuushoku.com
resources:
jobs:
our_job:
job_clusters:
- job_cluster_key: ${bundle.target}-${bundle.name}-job-cluster
new_cluster:
num_workers: 2
spark_version: "14.3.x-cpu-ml-scala2.12"
node_type_id: Standard_F8
spark_env_vars:
PIP_EXTRA_INDEX_URL: "{{secrets/our_secret/our_url}}"
To the pyproject.toml:
[[tool.poetry.source]]
name = "our_enterprise"
url = "https://our_enterprise.pkgs.visualstudio.com/_packaging/our_enterprise/pypi/simple"
priority = "supplemental"
With that I can retrieve the libraries from the artifact feed.
2 weeks ago
how do you authenticate in this case?
a week ago
For Authentication you can provide below config on cluster's Spark Environment Variables,
PIP_EXTRA_INDEX_URL=https://username:password@pkgs.sample.com/sample/_packaging/artifactory_name/pypi/simple/.
Also, you can store the value in Databricks secret
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group