cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Issue with Private PyPI Mirror Package Dependencies Installation

hugodscarvalho
New Contributor II

I'm encountering an issue with the installation of Python packages from a Private PyPI mirror, specifically when the package contains dependencies and the installation is on Databricks clusters - Cluster libraries | Databricks on AWS. Initially, everything worked smoothly, with packages being installed and executed as expected - no dependencies. However, as my package evolved and a more complex version was deployed to Artifactory, which includes dependencies specified in the install_requires parameter within setup.py of the package, the installation fails. The package dependencies from Public PyPi are not being resolved, resulting in errors like the following:

 

ERROR: Could not find a version that satisfies the requirement package_x==1.2.3 (from versions: none).

 

It seems that the installation process in the cluster might be using the parameter index-url instead of extra-index-url. Interestingly, in a notebook context - Notebook-scoped Python libraries | Databricks on AWS, when installing the same package with extra-index-url, the installation proceeds without any issues.

This inconsistency is proving to be quite challenging, particularly as projects become more complex and reliant on external dependencies.

I'm reaching out to the community for any insights or assistance in resolving this matter. If anyone has encountered a similar issue or has suggestions for potential workarounds, I would greatly appreciate your input.

 

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz_Fatma
Community Manager
Community Manager

Hi @hugodscarvalho, Itโ€™s frustrating when package installation issues crop up, especially when dealing with dependencies in complex projects.

Letโ€™s explore some potential solutions to address this inconsistency in your Databricks cluster installations.

  1. Cluster-Scoped Initialization Scripts:

    • One approach you can try is using cluster-scoped initialization scripts. These scripts run before the cluster starts and allow you to set up dependencies or configurations.
    • Create an init script that installs the required Python packages using pip. You can include this script in your cluster configuration.
    • For example, if youโ€™re using Databricks File System (DBFS), you can create an init script and store it in a DBFS directory. Then attach this script to your cluster configuration.
    • Hereโ€™s an example of how you might structure your init script:
      #!/bin/bash
      pip install package_x==1.2.3
      
    • Make sure to adjust the package name and version according to your requirements.
    • Attach this script to your cluster, and it will run during cluster startup, ensuring that the necess...1.
  2. Check Artifactory Configuration:

    • Verify that your Artifactory configuration for the private PyPI mirror is correctly set up. Ensure that the index URL and extra index URL are configured appropriately.
    • Double-check the repository settings in Artifactory to ensure that it serves the correct packages and versions.
    • If youโ€™re using a virtual repository, make sure it aggregates the relevant repositories (public and private) correctly.
  3. Notebook-Scoped Libraries:

  4. Dependency Resolution:

    • When deploying your package to Artifactory, ensure that all its dependencies are correctly specified in the install_requires parameter within setup.py.
    • Check if any of the dependencies have specific version constraints that might cause conflicts.
    • You can also try specifying the exact versions of dependencies in your setup.py to avoid any ambiguity.

Remember that debugging dependency issues can be time-consuming, but persistence pays off. Try these steps, and hopefully, youโ€™ll find a solution that works consistently for your Databricks cluster installations. If you encounter any further challenges, feel free to reach out to seek additional assistance.

Good luck! ๐Ÿš€34.

To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedback not only helps us assist you better but also benefits other community members who may have similar questions in the future.

If you found the answer helpful, consider giving it a kudo. If the response fully addresses your question, please mark it as the accepted solution. This will help us close the thread and ensure your question is resolved.

We appreciate your participation and are here to assist you further if you need it!

 

View solution in original post

3 REPLIES 3

Kaniz_Fatma
Community Manager
Community Manager

Hi @hugodscarvalho, Itโ€™s frustrating when package installation issues crop up, especially when dealing with dependencies in complex projects.

Letโ€™s explore some potential solutions to address this inconsistency in your Databricks cluster installations.

  1. Cluster-Scoped Initialization Scripts:

    • One approach you can try is using cluster-scoped initialization scripts. These scripts run before the cluster starts and allow you to set up dependencies or configurations.
    • Create an init script that installs the required Python packages using pip. You can include this script in your cluster configuration.
    • For example, if youโ€™re using Databricks File System (DBFS), you can create an init script and store it in a DBFS directory. Then attach this script to your cluster configuration.
    • Hereโ€™s an example of how you might structure your init script:
      #!/bin/bash
      pip install package_x==1.2.3
      
    • Make sure to adjust the package name and version according to your requirements.
    • Attach this script to your cluster, and it will run during cluster startup, ensuring that the necess...1.
  2. Check Artifactory Configuration:

    • Verify that your Artifactory configuration for the private PyPI mirror is correctly set up. Ensure that the index URL and extra index URL are configured appropriately.
    • Double-check the repository settings in Artifactory to ensure that it serves the correct packages and versions.
    • If youโ€™re using a virtual repository, make sure it aggregates the relevant repositories (public and private) correctly.
  3. Notebook-Scoped Libraries:

  4. Dependency Resolution:

    • When deploying your package to Artifactory, ensure that all its dependencies are correctly specified in the install_requires parameter within setup.py.
    • Check if any of the dependencies have specific version constraints that might cause conflicts.
    • You can also try specifying the exact versions of dependencies in your setup.py to avoid any ambiguity.

Remember that debugging dependency issues can be time-consuming, but persistence pays off. Try these steps, and hopefully, youโ€™ll find a solution that works consistently for your Databricks cluster installations. If you encounter any further challenges, feel free to reach out to seek additional assistance.

Good luck! ๐Ÿš€34.

To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedback not only helps us assist you better but also benefits other community members who may have similar questions in the future.

If you found the answer helpful, consider giving it a kudo. If the response fully addresses your question, please mark it as the accepted solution. This will help us close the thread and ensure your question is resolved.

We appreciate your participation and are here to assist you further if you need it!

 

Hello @Kaniz_Fatma ,

โœ…Thank you for all the help and the multiple suggestions provided! I was able to successfully solve the issue based on the second option.

It turns out that our problem stemmed from an incorrectly configured JFrog Artifactory setup. Once we rectified this by utilizing a virtual repository that combines both our local (private PyPI server for internal deployments) and a remote (proxy to public PyPI) repository, our Databricks cluster installations became consistent, including the dependencies from public PyPI.

I really appreciate your support!

Adiga
New Contributor II

Hi @hugodscarvalho ,

I am also at this point, where the transitive dependencies (available in jfrog) are not getting installed in my job cluster. Could you please elaborate a bit on what exactly needed to be changed in the JFrog setup for this to work. Would be a great help.

Thanks in advance.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group