cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

install a custom python package from azure devops artifact to databricks cluster

Sulfikkar
Contributor

I am trying to install a package which was uploaded into the azure devops artifact into the databricks cluster by using pip.conf.

Basically below are the steps I followed.(step 1 : install in local IDE)

  1. Uploaded the package to azure devops feed using twine
  2. created a pat token in azure devops
  3. created the pip.cong in my local machine and used the pat token in pip.conf
  4. installed the library into my local IDE.

Till step 4, it's working fine. However when I try to replicate the same to install the package in azure databricks cluster it fails. Below are the steps I followed.(Step 2 : install in databricks cluster)

  1. stored the pat token into secret of the keyvault in azure
  2. created databricks secret scope to access the secret in azure key vault
  3. used the environment variable to access the secret scope created.
  4. created an iniit script to write the index url into the etc/pip.conf file.

I have checked the secret list and scopes and everything is valid. However when I try to install the package from cluster using pypi, its not pointing to the index url I provided in the pip.conf. Also I tried to execute the pip install command by passing the index url from the notebook and it says the package version is not found in the devops artifact. But the same works fine in local IDE

Basically this is the error when I passed the index url and tried to install the package manually from cluster

ERROR: Could not find a version that satisfies the requirement RelayDataVault==0.5.3  (from versions: none)

ERROR: No matching distribution found for RelayDataVault==0.5.3 

but the same package is getting installed in my local IDE.

Also as I mentioned by default, when I install the package from cluster its not searching in the index url i mentioned in the etc/pip.conf file.

Do anyone have a clue here ?

1 ACCEPTED SOLUTION

Accepted Solutions

Sulfikkar
Contributor

Thanks for your time @Debayan Mukherjee​  and @Kaniz Fatma​ . We have figured out the issue along with the infra team that we had to do a public ip whitelisting of the databricks clusters in azure.

I have checked the ip adress from the Spark cluster UI - Master but that was private one.

So I used the below code to found my public ip.

from requests import get

ip = get('https://api.ipify.org').text

print('My public IP address is:', ip)

The confusion happend here was because the error from databricks was so generic. May be databricks should add some extra exception during when an index-url is not reachable to databricks.  I would say to improve the exception.

the next problem is that the public ips set for these clusters are dynamic, so the ips get changed automatically. so we are finding a way to make it static and whitelist the ips, thanks

View solution in original post

5 REPLIES 5

Debayan
Esteemed Contributor III
Esteemed Contributor III

Hi,

Typically for this kind of error, you can follow something like below:

  • python3 -m relay_package --upgrade
  • check python module name on PyPI and install that module (e.g, pip3 install relay_package)

Rerun this command 

  • python3 -m relay_package --upgrade

Kaniz
Community Manager
Community Manager

Hi @Sulfikkar Basheer Shylaja​ , We haven’t heard from you on the last response from @Debayan Mukherjee​, and I was checking back to see if you have a resolution yet.

If you have any solution, please share it with the community as it can be helpful to others. Otherwise, we will respond with more details and try to help.

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

@Kaniz Fatma​  the solution @Debayan Mukherjee​  provided is not the actual problem. I just explained a bit more about the issue in the post.

I have some link similar to what I have been trying to achieve here. Please refer to the same in case my explanation was not clear 🙂

https://polarpersonal.medium.com/releasing-and-using-python-packages-with-azure-devops-and-azure-dat...

https://towardsdatascience.com/install-custom-python-libraries-from-private-pypi-on-databricks-6a766...

Hi @Debayan Mukherjee​  , I have some link similar to what I have been trying to achieve here. Please refer to the same in case my explanation was not clear 🙂

https://polarpersonal.medium.com/releasing-and-using-python-packages-with-azure-devops-and-azure-dat...

https://towardsdatascience.com/install-custom-python-libraries-from-private-pypi-on-databricks-6a766...

Sulfikkar
Contributor

Thanks for your time @Debayan Mukherjee​  and @Kaniz Fatma​ . We have figured out the issue along with the infra team that we had to do a public ip whitelisting of the databricks clusters in azure.

I have checked the ip adress from the Spark cluster UI - Master but that was private one.

So I used the below code to found my public ip.

from requests import get

ip = get('https://api.ipify.org').text

print('My public IP address is:', ip)

The confusion happend here was because the error from databricks was so generic. May be databricks should add some extra exception during when an index-url is not reachable to databricks.  I would say to improve the exception.

the next problem is that the public ips set for these clusters are dynamic, so the ips get changed automatically. so we are finding a way to make it static and whitelist the ips, thanks

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!