10-06-2022 01:49 PM
I am trying to install a package which was uploaded into the azure devops artifact into the databricks cluster by using pip.conf.
Basically below are the steps I followed.(step 1 : install in local IDE)
Till step 4, it's working fine. However when I try to replicate the same to install the package in azure databricks cluster it fails. Below are the steps I followed.(Step 2 : install in databricks cluster)
I have checked the secret list and scopes and everything is valid. However when I try to install the package from cluster using pypi, its not pointing to the index url I provided in the pip.conf. Also I tried to execute the pip install command by passing the index url from the notebook and it says the package version is not found in the devops artifact. But the same works fine in local IDE
Basically this is the error when I passed the index url and tried to install the package manually from cluster
ERROR: Could not find a version that satisfies the requirement RelayDataVault==0.5.3 (from versions: none)
ERROR: No matching distribution found for RelayDataVault==0.5.3
but the same package is getting installed in my local IDE.
Also as I mentioned by default, when I install the package from cluster its not searching in the index url i mentioned in the etc/pip.conf file.
Do anyone have a clue here ?
10-14-2022 12:07 AM
Thanks for your time @Debayan Mukherjee and @Kaniz Fatma . We have figured out the issue along with the infra team that we had to do a public ip whitelisting of the databricks clusters in azure.
I have checked the ip adress from the Spark cluster UI - Master but that was private one.
So I used the below code to found my public ip.
from requests import get
ip = get('https://api.ipify.org').text
print('My public IP address is:', ip)
The confusion happend here was because the error from databricks was so generic. May be databricks should add some extra exception during when an index-url is not reachable to databricks. I would say to improve the exception.
the next problem is that the public ips set for these clusters are dynamic, so the ips get changed automatically. so we are finding a way to make it static and whitelist the ips, thanks
10-07-2022 10:08 PM
Hi,
Typically for this kind of error, you can follow something like below:
Rerun this command
10-10-2022 12:03 AM
@Kaniz Fatma the solution @Debayan Mukherjee provided is not the actual problem. I just explained a bit more about the issue in the post.
I have some link similar to what I have been trying to achieve here. Please refer to the same in case my explanation was not clear 🙂
10-10-2022 12:04 AM
Hi @Debayan Mukherjee , I have some link similar to what I have been trying to achieve here. Please refer to the same in case my explanation was not clear 🙂
10-14-2022 12:07 AM
Thanks for your time @Debayan Mukherjee and @Kaniz Fatma . We have figured out the issue along with the infra team that we had to do a public ip whitelisting of the databricks clusters in azure.
I have checked the ip adress from the Spark cluster UI - Master but that was private one.
So I used the below code to found my public ip.
from requests import get
ip = get('https://api.ipify.org').text
print('My public IP address is:', ip)
The confusion happend here was because the error from databricks was so generic. May be databricks should add some extra exception during when an index-url is not reachable to databricks. I would say to improve the exception.
the next problem is that the public ips set for these clusters are dynamic, so the ips get changed automatically. so we are finding a way to make it static and whitelist the ips, thanks
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group