cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Configuring pip index-url and using artifacts-keyring

Confused
New Contributor III

Hi

I would like to use the azure artifact feed as my default index-url when doing a pip install on a Databricks cluster. I understand I can achieve this by updating the pip.conf file with my artifact feed as the index-url. Does anyone know where in the filesystem I would update that, or has a global init script that achieves that?

My second question, I would like to have it so users can pip install from a notebook, while having that exact same notebook work when run in a job by a service principal, so can't authenticate as a user.

I believe that using the artifacts-keyring I should be able to add an environment variable on the jobs cluster that forces it to use a PAT to authenticate to the artifact feed. Has anyone tried this/has an example of doing similar?

Many Thanks

Mat

1 ACCEPTED SOLUTION

Accepted Solutions

Atanu
Databricks Employee
Databricks Employee
6 REPLIES 6

Atanu
Databricks Employee
Databricks Employee

jose_gonzalez
Databricks Employee
Databricks Employee

Hi @Mathew Walters​, 

Just a friendly follow-up. Do you still need help or Atanu's response helped you to resolved your issue?

ipreston
New Contributor III

Reviving this thread as I'm having the same issue. While embedding the PAT would work to install a package directly into my environment, my requirement is to install packages that have libraries hosted in artifact feeds as dependencies. Since the dependency spec is external to databricks in that situation I don't think I can retrieve a PAT from a secret scope to accomplish this. Outside of databricks as described above I'd authenticate to DevOps with somethine like az login and then the credential would be accessible through artifacts keyring. I'm not sure how to reproduce this in databricks though

PabloCSD
Valued Contributor

Hello @ipreston,

In my team right now we are using Databrick Asset Bundle (DAB) and for solving a similar issue which is: retrieving private libraries hosted in Azure DevOps Artifact Feed, we added:

To the databricks.yml:

 

targets:
  dev:
    mode: development
    default: true
    run_as:
      user_name: oishiiramen@chuushoku.com
    resources:
      jobs:
        our_job:
          job_clusters:
            - job_cluster_key: ${bundle.target}-${bundle.name}-job-cluster
              new_cluster:
                num_workers: 2
                spark_version: "14.3.x-cpu-ml-scala2.12"
                node_type_id: Standard_F8
                spark_env_vars:
                  PIP_EXTRA_INDEX_URL: "{{secrets/our_secret/our_url}}"

 

To the pyproject.toml:

 

[[tool.poetry.source]]
name = "our_enterprise"
url = "https://our_enterprise.pkgs.visualstudio.com/_packaging/our_enterprise/pypi/simple"
priority = "supplemental"

 

With that I can retrieve the libraries from the artifact feed.

SergeyK
New Contributor II

how do you authenticate in this case?

murtazahzaveri
New Contributor II

For Authentication you can provide below config on cluster's Spark Environment Variables,
PIP_EXTRA_INDEX_URL=https://username:password@pkgs.sample.com/sample/_packaging/artifactory_name/pypi/simple/.

Also, you can store the value in Databricks secret

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group