cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

Issue with Private PyPI Mirror Package Dependencies Installation

hugodscarvalho
New Contributor II

I'm encountering an issue with the installation of Python packages from a Private PyPI mirror, specifically when the package contains dependencies and the installation is on Databricks clusters - Cluster libraries | Databricks on AWS. Initially, everything worked smoothly, with packages being installed and executed as expected - no dependencies. However, as my package evolved and a more complex version was deployed to Artifactory, which includes dependencies specified in the install_requires parameter within setup.py of the package, the installation fails. The package dependencies from Public PyPi are not being resolved, resulting in errors like the following:

 

ERROR: Could not find a version that satisfies the requirement package_x==1.2.3 (from versions: none).

 

It seems that the installation process in the cluster might be using the parameter index-url instead of extra-index-url. Interestingly, in a notebook context - Notebook-scoped Python libraries | Databricks on AWS, when installing the same package with extra-index-url, the installation proceeds without any issues.

This inconsistency is proving to be quite challenging, particularly as projects become more complex and reliant on external dependencies.

I'm reaching out to the community for any insights or assistance in resolving this matter. If anyone has encountered a similar issue or has suggestions for potential workarounds, I would greatly appreciate your input.

 

2 REPLIES 2

Hello @Retired_mod ,

Thank you for all the help and the multiple suggestions provided! I was able to successfully solve the issue based on the second option.

It turns out that our problem stemmed from an incorrectly configured JFrog Artifactory setup. Once we rectified this by utilizing a virtual repository that combines both our local (private PyPI server for internal deployments) and a remote (proxy to public PyPI) repository, our Databricks cluster installations became consistent, including the dependencies from public PyPI.

I really appreciate your support!

Adiga
New Contributor II

Hi @hugodscarvalho ,

I am also at this point, where the transitive dependencies (available in jfrog) are not getting installed in my job cluster. Could you please elaborate a bit on what exactly needed to be changed in the JFrog setup for this to work. Would be a great help.

Thanks in advance.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group