cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

install a custom python package from azure devops artifact to databricks cluster

Sulfikkar
Contributor

I am trying to install a package which was uploaded into the azure devops artifact into the databricks cluster by using pip.conf.

Basically below are the steps I followed.(step 1 : install in local IDE)

  1. Uploaded the package to azure devops feed using twine
  2. created a pat token in azure devops
  3. created the pip.cong in my local machine and used the pat token in pip.conf
  4. installed the library into my local IDE.

Till step 4, it's working fine. However when I try to replicate the same to install the package in azure databricks cluster it fails. Below are the steps I followed.(Step 2 : install in databricks cluster)

  1. stored the pat token into secret of the keyvault in azure
  2. created databricks secret scope to access the secret in azure key vault
  3. used the environment variable to access the secret scope created.
  4. created an iniit script to write the index url into the etc/pip.conf file.

I have checked the secret list and scopes and everything is valid. However when I try to install the package from cluster using pypi, its not pointing to the index url I provided in the pip.conf. Also I tried to execute the pip install command by passing the index url from the notebook and it says the package version is not found in the devops artifact. But the same works fine in local IDE

Basically this is the error when I passed the index url and tried to install the package manually from cluster

ERROR: Could not find a version that satisfies the requirement RelayDataVault==0.5.3  (from versions: none)

ERROR: No matching distribution found for RelayDataVault==0.5.3 

but the same package is getting installed in my local IDE.

Also as I mentioned by default, when I install the package from cluster its not searching in the index url i mentioned in the etc/pip.conf file.

Do anyone have a clue here ?

1 ACCEPTED SOLUTION

Accepted Solutions

Sulfikkar
Contributor

Thanks for your time @Debayan Mukherjee​  and @Kaniz Fatma​ . We have figured out the issue along with the infra team that we had to do a public ip whitelisting of the databricks clusters in azure.

I have checked the ip adress from the Spark cluster UI - Master but that was private one.

So I used the below code to found my public ip.

from requests import get

ip = get('https://api.ipify.org').text

print('My public IP address is:', ip)

The confusion happend here was because the error from databricks was so generic. May be databricks should add some extra exception during when an index-url is not reachable to databricks.  I would say to improve the exception.

the next problem is that the public ips set for these clusters are dynamic, so the ips get changed automatically. so we are finding a way to make it static and whitelist the ips, thanks

View solution in original post

4 REPLIES 4

Debayan
Databricks Employee
Databricks Employee

Hi,

Typically for this kind of error, you can follow something like below:

  • python3 -m relay_package --upgrade
  • check python module name on PyPI and install that module (e.g, pip3 install relay_package)

Rerun this command 

  • python3 -m relay_package --upgrade

@Kaniz Fatma​  the solution @Debayan Mukherjee​  provided is not the actual problem. I just explained a bit more about the issue in the post.

I have some link similar to what I have been trying to achieve here. Please refer to the same in case my explanation was not clear 🙂

https://polarpersonal.medium.com/releasing-and-using-python-packages-with-azure-devops-and-azure-dat...

https://towardsdatascience.com/install-custom-python-libraries-from-private-pypi-on-databricks-6a766...

Hi @Debayan Mukherjee​  , I have some link similar to what I have been trying to achieve here. Please refer to the same in case my explanation was not clear 🙂

https://polarpersonal.medium.com/releasing-and-using-python-packages-with-azure-devops-and-azure-dat...

https://towardsdatascience.com/install-custom-python-libraries-from-private-pypi-on-databricks-6a766...

Sulfikkar
Contributor

Thanks for your time @Debayan Mukherjee​  and @Kaniz Fatma​ . We have figured out the issue along with the infra team that we had to do a public ip whitelisting of the databricks clusters in azure.

I have checked the ip adress from the Spark cluster UI - Master but that was private one.

So I used the below code to found my public ip.

from requests import get

ip = get('https://api.ipify.org').text

print('My public IP address is:', ip)

The confusion happend here was because the error from databricks was so generic. May be databricks should add some extra exception during when an index-url is not reachable to databricks.  I would say to improve the exception.

the next problem is that the public ips set for these clusters are dynamic, so the ips get changed automatically. so we are finding a way to make it static and whitelist the ips, thanks

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group