cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Using private package, getting ERROR: No matching distribution found for myprivatepackage

vinitkhandelwal
New Contributor III

My project's setup.py file

from setuptools import find_packages, setup

PACKAGE_REQUIREMENTS = ["pyyaml","confluent-kafka", "fastavro", "python-dotenv","boto3", "pyxlsb", "aiohttp", "myprivatepackage"]

LOCAL_REQUIREMENTS = ["delta-spark", "scikit-learn", "pandas", "mlflow", "databricks-sql-connector", "kafka-python"]

TEST_REQUIREMENTS = ["pytest", "coverage[toml]", "pytest-cov", "dbx>=0.7,<0.8"]

setup(
    name="my_project",
    packages=find_packages(exclude=["tests", "tests.*"]),
    setup_requires=["setuptools","wheel"],
    install_requires=PACKAGE_REQUIREMENTS,
    extras_require={"local": LOCAL_REQUIREMENTS, "test": TEST_REQUIREMENTS},
    entry_points = {
        "console_scripts": [
            "etl = my_project.tasks.sample_etl_task:entrypoint"
        ]
    },
    version=__version__,
    description="My project",
    author="me",
)

I am using dbx to deploy so here is how my deployment.yaml looks like

environments:
  dev:
    workflows:
      - name: "mytask"  
        tasks:
          - task_key: "mytask"
            new_cluster:
              spark_version: "14.2.x-scala2.12"
              node_type_id: "r5d.large"
              data_security_mode: "SINGLE_USER"
              spark_conf:
                spark.databricks.delta.preview.enabled: 'true'
                spark.databricks.cluster.profile: 'singleNode'
                spark.master: 'local[*, 4]'
              runtime_engine: STANDARD
              num_workers: 0  
            spark_python_task:
              python_file: "file://my_project/entity/mytask/tasks/mytask.py"

Then I run the following command to deploy

dbx deploy --deployment-file ./conf/dev/deployment.yml -e dev

It deploys fine. No errors!
But when I run the job, I get the following error

/local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/pip install --upgrade /local_disk0/tmp/abc/[REDACTED]-0.8.0-py3-none-any.whl --disable-pip-version-check) exited with code 1, and Processing /local_disk0/tmp/abc/[REDACTED]-0.8.0-py3-none-any.whl
24/01/16 12:13:43 INFO SharedDriverContext: Failed to attach library dbfs:/Shared/dbx/projects/[REDACTED]/abc/artifacts/dist/[REDACTED]-0.8.0-py3-none-any.whl to Spark
java.lang.Throwable: Process List(/bin/su, libraries, -c, bash /local_disk0/.ephemeral_nfs/cluster_libraries/python/python_start_clusterwide.sh /local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/pip install --upgrade /local_disk0/tmp/abc/[REDACTED]-0.8.0-py3-none-any.whl --disable-pip-version-check) exited with code 1. ERROR: Could not find a version that satisfies the requirement myprivatepackage (from [REDACTED]) (from versions: none)
ERROR: No matching distribution found for myprivatepackage

 How do I resolve this?

1 ACCEPTED SOLUTION

Accepted Solutions

I added init script to compute in order to add details of private package login in /etc/pip.conf

Something as follows:

 

 

resource "databricks_workspace_file" "gitlab_pypi_init_script" {
  provider = databricks.workspace
  content_base64 = base64encode(<<-EOT
    #!/bin/bash
    if [[ $PYPI_TOKEN ]]; then
    use $PYPI_TOKEN
    fi
    echo $PYPI_TOKEN
    printf "[global]\n" > /etc/pip.conf
    printf "extra-index-url =\n" >> /etc/pip.conf
    printf "\thttps://__token__:$PYPI_TOKEN@gitlab.com/api/v4/projects/12345678/packages/pypi/simple\n" >> /etc/pip.conf
    EOT
  )
  path = "/FileStore/gitlab_pypi_init_script.sh"
}

 

 

 I added this file to Workspace and then referenced it under init scripts in cluster compute and it worked to install private package in the cluster when it starts

I also made sure gitlab token was accessible with variable PYPI_TOKEN using Spark environment variables

View solution in original post

2 REPLIES 2

Debayan
Databricks Employee
Databricks Employee

Hi, Does this look like a dependency error? All the dependencies are packed in the whl? Also, could you please confirm if all the limitations are satified? Refer:  https://docs.databricks.com/en/compute/access-mode-limitations.html 

I added init script to compute in order to add details of private package login in /etc/pip.conf

Something as follows:

 

 

resource "databricks_workspace_file" "gitlab_pypi_init_script" {
  provider = databricks.workspace
  content_base64 = base64encode(<<-EOT
    #!/bin/bash
    if [[ $PYPI_TOKEN ]]; then
    use $PYPI_TOKEN
    fi
    echo $PYPI_TOKEN
    printf "[global]\n" > /etc/pip.conf
    printf "extra-index-url =\n" >> /etc/pip.conf
    printf "\thttps://__token__:$PYPI_TOKEN@gitlab.com/api/v4/projects/12345678/packages/pypi/simple\n" >> /etc/pip.conf
    EOT
  )
  path = "/FileStore/gitlab_pypi_init_script.sh"
}

 

 

 I added this file to Workspace and then referenced it under init scripts in cluster compute and it worked to install private package in the cluster when it starts

I also made sure gitlab token was accessible with variable PYPI_TOKEN using Spark environment variables

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group