cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Using private package, getting ERROR: No matching distribution found for myprivatepackage

777433
New Contributor III

My project's setup.py file

from setuptools import find_packages, setup

PACKAGE_REQUIREMENTS = ["pyyaml","confluent-kafka", "fastavro", "python-dotenv","boto3", "pyxlsb", "aiohttp", "myprivatepackage"]

LOCAL_REQUIREMENTS = ["delta-spark", "scikit-learn", "pandas", "mlflow", "databricks-sql-connector", "kafka-python"]

TEST_REQUIREMENTS = ["pytest", "coverage[toml]", "pytest-cov", "dbx>=0.7,<0.8"]

setup(
    name="my_project",
    packages=find_packages(exclude=["tests", "tests.*"]),
    setup_requires=["setuptools","wheel"],
    install_requires=PACKAGE_REQUIREMENTS,
    extras_require={"local": LOCAL_REQUIREMENTS, "test": TEST_REQUIREMENTS},
    entry_points = {
        "console_scripts": [
            "etl = my_project.tasks.sample_etl_task:entrypoint"
        ]
    },
    version=__version__,
    description="My project",
    author="me",
)

I am using dbx to deploy so here is how my deployment.yaml looks like

environments:
  dev:
    workflows:
      - name: "mytask"  
        tasks:
          - task_key: "mytask"
            new_cluster:
              spark_version: "14.2.x-scala2.12"
              node_type_id: "r5d.large"
              data_security_mode: "SINGLE_USER"
              spark_conf:
                spark.databricks.delta.preview.enabled: 'true'
                spark.databricks.cluster.profile: 'singleNode'
                spark.master: 'local[*, 4]'
              runtime_engine: STANDARD
              num_workers: 0  
            spark_python_task:
              python_file: "file://my_project/entity/mytask/tasks/mytask.py"

Then I run the following command to deploy

dbx deploy --deployment-file ./conf/dev/deployment.yml -e dev

It deploys fine. No errors!
But when I run the job, I get the following error

/local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/pip install --upgrade /local_disk0/tmp/abc/[REDACTED]-0.8.0-py3-none-any.whl --disable-pip-version-check) exited with code 1, and Processing /local_disk0/tmp/abc/[REDACTED]-0.8.0-py3-none-any.whl
24/01/16 12:13:43 INFO SharedDriverContext: Failed to attach library dbfs:/Shared/dbx/projects/[REDACTED]/abc/artifacts/dist/[REDACTED]-0.8.0-py3-none-any.whl to Spark
java.lang.Throwable: Process List(/bin/su, libraries, -c, bash /local_disk0/.ephemeral_nfs/cluster_libraries/python/python_start_clusterwide.sh /local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/pip install --upgrade /local_disk0/tmp/abc/[REDACTED]-0.8.0-py3-none-any.whl --disable-pip-version-check) exited with code 1. ERROR: Could not find a version that satisfies the requirement myprivatepackage (from [REDACTED]) (from versions: none)
ERROR: No matching distribution found for myprivatepackage

 How do I resolve this?

1 ACCEPTED SOLUTION

Accepted Solutions

777433
New Contributor III

I added init script to compute in order to add details of private package login in /etc/pip.conf

Something as follows:

 

 

resource "databricks_workspace_file" "gitlab_pypi_init_script" {
  provider = databricks.workspace
  content_base64 = base64encode(<<-EOT
    #!/bin/bash
    if [[ $PYPI_TOKEN ]]; then
    use $PYPI_TOKEN
    fi
    echo $PYPI_TOKEN
    printf "[global]\n" > /etc/pip.conf
    printf "extra-index-url =\n" >> /etc/pip.conf
    printf "\thttps://__token__:$PYPI_TOKEN@gitlab.com/api/v4/projects/12345678/packages/pypi/simple\n" >> /etc/pip.conf
    EOT
  )
  path = "/FileStore/gitlab_pypi_init_script.sh"
}

 

 

 I added this file to Workspace and then referenced it under init scripts in cluster compute and it worked to install private package in the cluster when it starts

I also made sure gitlab token was accessible with variable PYPI_TOKEN using Spark environment variables

View solution in original post

2 REPLIES 2

Debayan
Esteemed Contributor III
Esteemed Contributor III

Hi, Does this look like a dependency error? All the dependencies are packed in the whl? Also, could you please confirm if all the limitations are satified? Refer:  https://docs.databricks.com/en/compute/access-mode-limitations.html 

777433
New Contributor III

I added init script to compute in order to add details of private package login in /etc/pip.conf

Something as follows:

 

 

resource "databricks_workspace_file" "gitlab_pypi_init_script" {
  provider = databricks.workspace
  content_base64 = base64encode(<<-EOT
    #!/bin/bash
    if [[ $PYPI_TOKEN ]]; then
    use $PYPI_TOKEN
    fi
    echo $PYPI_TOKEN
    printf "[global]\n" > /etc/pip.conf
    printf "extra-index-url =\n" >> /etc/pip.conf
    printf "\thttps://__token__:$PYPI_TOKEN@gitlab.com/api/v4/projects/12345678/packages/pypi/simple\n" >> /etc/pip.conf
    EOT
  )
  path = "/FileStore/gitlab_pypi_init_script.sh"
}

 

 

 I added this file to Workspace and then referenced it under init scripts in cluster compute and it worked to install private package in the cluster when it starts

I also made sure gitlab token was accessible with variable PYPI_TOKEN using Spark environment variables

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!