- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-16-2024 05:17 AM
My project's setup.py file
from setuptools import find_packages, setup
PACKAGE_REQUIREMENTS = ["pyyaml","confluent-kafka", "fastavro", "python-dotenv","boto3", "pyxlsb", "aiohttp", "myprivatepackage"]
LOCAL_REQUIREMENTS = ["delta-spark", "scikit-learn", "pandas", "mlflow", "databricks-sql-connector", "kafka-python"]
TEST_REQUIREMENTS = ["pytest", "coverage[toml]", "pytest-cov", "dbx>=0.7,<0.8"]
setup(
name="my_project",
packages=find_packages(exclude=["tests", "tests.*"]),
setup_requires=["setuptools","wheel"],
install_requires=PACKAGE_REQUIREMENTS,
extras_require={"local": LOCAL_REQUIREMENTS, "test": TEST_REQUIREMENTS},
entry_points = {
"console_scripts": [
"etl = my_project.tasks.sample_etl_task:entrypoint"
]
},
version=__version__,
description="My project",
author="me",
)
I am using dbx to deploy so here is how my deployment.yaml looks like
environments:
dev:
workflows:
- name: "mytask"
tasks:
- task_key: "mytask"
new_cluster:
spark_version: "14.2.x-scala2.12"
node_type_id: "r5d.large"
data_security_mode: "SINGLE_USER"
spark_conf:
spark.databricks.delta.preview.enabled: 'true'
spark.databricks.cluster.profile: 'singleNode'
spark.master: 'local[*, 4]'
runtime_engine: STANDARD
num_workers: 0
spark_python_task:
python_file: "file://my_project/entity/mytask/tasks/mytask.py"
Then I run the following command to deploy
dbx deploy --deployment-file ./conf/dev/deployment.yml -e dev
It deploys fine. No errors!
But when I run the job, I get the following error
/local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/pip install --upgrade /local_disk0/tmp/abc/[REDACTED]-0.8.0-py3-none-any.whl --disable-pip-version-check) exited with code 1, and Processing /local_disk0/tmp/abc/[REDACTED]-0.8.0-py3-none-any.whl
24/01/16 12:13:43 INFO SharedDriverContext: Failed to attach library dbfs:/Shared/dbx/projects/[REDACTED]/abc/artifacts/dist/[REDACTED]-0.8.0-py3-none-any.whl to Spark
java.lang.Throwable: Process List(/bin/su, libraries, -c, bash /local_disk0/.ephemeral_nfs/cluster_libraries/python/python_start_clusterwide.sh /local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/pip install --upgrade /local_disk0/tmp/abc/[REDACTED]-0.8.0-py3-none-any.whl --disable-pip-version-check) exited with code 1. ERROR: Could not find a version that satisfies the requirement myprivatepackage (from [REDACTED]) (from versions: none)
ERROR: No matching distribution found for myprivatepackage
How do I resolve this?
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-03-2024 08:14 AM - edited 07-03-2024 08:33 AM
I added init script to compute in order to add details of private package login in /etc/pip.conf
Something as follows:
resource "databricks_workspace_file" "gitlab_pypi_init_script" {
provider = databricks.workspace
content_base64 = base64encode(<<-EOT
#!/bin/bash
if [[ $PYPI_TOKEN ]]; then
use $PYPI_TOKEN
fi
echo $PYPI_TOKEN
printf "[global]\n" > /etc/pip.conf
printf "extra-index-url =\n" >> /etc/pip.conf
printf "\thttps://__token__:$PYPI_TOKEN@gitlab.com/api/v4/projects/12345678/packages/pypi/simple\n" >> /etc/pip.conf
EOT
)
path = "/FileStore/gitlab_pypi_init_script.sh"
}
I added this file to Workspace and then referenced it under init scripts in cluster compute and it worked to install private package in the cluster when it starts
I also made sure gitlab token was accessible with variable PYPI_TOKEN using Spark environment variables
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-19-2024 09:38 PM
Hi, Does this look like a dependency error? All the dependencies are packed in the whl? Also, could you please confirm if all the limitations are satified? Refer: https://docs.databricks.com/en/compute/access-mode-limitations.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-03-2024 08:14 AM - edited 07-03-2024 08:33 AM
I added init script to compute in order to add details of private package login in /etc/pip.conf
Something as follows:
resource "databricks_workspace_file" "gitlab_pypi_init_script" {
provider = databricks.workspace
content_base64 = base64encode(<<-EOT
#!/bin/bash
if [[ $PYPI_TOKEN ]]; then
use $PYPI_TOKEN
fi
echo $PYPI_TOKEN
printf "[global]\n" > /etc/pip.conf
printf "extra-index-url =\n" >> /etc/pip.conf
printf "\thttps://__token__:$PYPI_TOKEN@gitlab.com/api/v4/projects/12345678/packages/pypi/simple\n" >> /etc/pip.conf
EOT
)
path = "/FileStore/gitlab_pypi_init_script.sh"
}
I added this file to Workspace and then referenced it under init scripts in cluster compute and it worked to install private package in the cluster when it starts
I also made sure gitlab token was accessible with variable PYPI_TOKEN using Spark environment variables

