โ04-23-2024 01:08 AM
I'm encountering an issue with the installation of Python packages from a Private PyPI mirror, specifically when the package contains dependencies and the installation is on Databricks clusters - Cluster libraries | Databricks on AWS. Initially, everything worked smoothly, with packages being installed and executed as expected - no dependencies. However, as my package evolved and a more complex version was deployed to Artifactory, which includes dependencies specified in the install_requires parameter within setup.py of the package, the installation fails. The package dependencies from Public PyPi are not being resolved, resulting in errors like the following:
ERROR: Could not find a version that satisfies the requirement package_x==1.2.3 (from versions: none).
It seems that the installation process in the cluster might be using the parameter index-url instead of extra-index-url. Interestingly, in a notebook context - Notebook-scoped Python libraries | Databricks on AWS, when installing the same package with extra-index-url, the installation proceeds without any issues.
This inconsistency is proving to be quite challenging, particularly as projects become more complex and reliant on external dependencies.
I'm reaching out to the community for any insights or assistance in resolving this matter. If anyone has encountered a similar issue or has suggestions for potential workarounds, I would greatly appreciate your input.
โ05-06-2024 04:50 AM
Hi @hugodscarvalho, Itโs frustrating when package installation issues crop up, especially when dealing with dependencies in complex projects.
Letโs explore some potential solutions to address this inconsistency in your Databricks cluster installations.
Cluster-Scoped Initialization Scripts:
pip
. You can include this script in your cluster configuration.#!/bin/bash
pip install package_x==1.2.3
Check Artifactory Configuration:
Notebook-Scoped Libraries:
extra-index-url
works in a notebook context. Consider using notebook-scoped libraries for consistency.extra-index-url
in the notebook settings.Dependency Resolution:
install_requires
parameter within setup.py
.setup.py
to avoid any ambiguity.Remember that debugging dependency issues can be time-consuming, but persistence pays off. Try these steps, and hopefully, youโll find a solution that works consistently for your Databricks cluster installations. If you encounter any further challenges, feel free to reach out to seek additional assistance.
To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedback not only helps us assist you better but also benefits other community members who may have similar questions in the future.
If you found the answer helpful, consider giving it a kudo. If the response fully addresses your question, please mark it as the accepted solution. This will help us close the thread and ensure your question is resolved.
We appreciate your participation and are here to assist you further if you need it!
โ05-06-2024 04:50 AM
Hi @hugodscarvalho, Itโs frustrating when package installation issues crop up, especially when dealing with dependencies in complex projects.
Letโs explore some potential solutions to address this inconsistency in your Databricks cluster installations.
Cluster-Scoped Initialization Scripts:
pip
. You can include this script in your cluster configuration.#!/bin/bash
pip install package_x==1.2.3
Check Artifactory Configuration:
Notebook-Scoped Libraries:
extra-index-url
works in a notebook context. Consider using notebook-scoped libraries for consistency.extra-index-url
in the notebook settings.Dependency Resolution:
install_requires
parameter within setup.py
.setup.py
to avoid any ambiguity.Remember that debugging dependency issues can be time-consuming, but persistence pays off. Try these steps, and hopefully, youโll find a solution that works consistently for your Databricks cluster installations. If you encounter any further challenges, feel free to reach out to seek additional assistance.
To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedback not only helps us assist you better but also benefits other community members who may have similar questions in the future.
If you found the answer helpful, consider giving it a kudo. If the response fully addresses your question, please mark it as the accepted solution. This will help us close the thread and ensure your question is resolved.
We appreciate your participation and are here to assist you further if you need it!
โ05-07-2024 08:11 AM
Hello @Kaniz_Fatma ,
โ Thank you for all the help and the multiple suggestions provided! I was able to successfully solve the issue based on the second option.
It turns out that our problem stemmed from an incorrectly configured JFrog Artifactory setup. Once we rectified this by utilizing a virtual repository that combines both our local (private PyPI server for internal deployments) and a remote (proxy to public PyPI) repository, our Databricks cluster installations became consistent, including the dependencies from public PyPI.
I really appreciate your support!
โ07-11-2024 10:06 PM
Hi @hugodscarvalho ,
I am also at this point, where the transitive dependencies (available in jfrog) are not getting installed in my job cluster. Could you please elaborate a bit on what exactly needed to be changed in the JFrog setup for this to work. Would be a great help.
Thanks in advance.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group