01-16-2024 05:17 AM
My project's setup.py file
from setuptools import find_packages, setup
PACKAGE_REQUIREMENTS = ["pyyaml","confluent-kafka", "fastavro", "python-dotenv","boto3", "pyxlsb", "aiohttp", "myprivatepackage"]
LOCAL_REQUIREMENTS = ["delta-spark", "scikit-learn", "pandas", "mlflow", "databricks-sql-connector", "kafka-python"]
TEST_REQUIREMENTS = ["pytest", "coverage[toml]", "pytest-cov", "dbx>=0.7,<0.8"]
setup(
name="my_project",
packages=find_packages(exclude=["tests", "tests.*"]),
setup_requires=["setuptools","wheel"],
install_requires=PACKAGE_REQUIREMENTS,
extras_require={"local": LOCAL_REQUIREMENTS, "test": TEST_REQUIREMENTS},
entry_points = {
"console_scripts": [
"etl = my_project.tasks.sample_etl_task:entrypoint"
]
},
version=__version__,
description="My project",
author="me",
)
I am using dbx to deploy so here is how my deployment.yaml looks like
environments:
dev:
workflows:
- name: "mytask"
tasks:
- task_key: "mytask"
new_cluster:
spark_version: "14.2.x-scala2.12"
node_type_id: "r5d.large"
data_security_mode: "SINGLE_USER"
spark_conf:
spark.databricks.delta.preview.enabled: 'true'
spark.databricks.cluster.profile: 'singleNode'
spark.master: 'local[*, 4]'
runtime_engine: STANDARD
num_workers: 0
spark_python_task:
python_file: "file://my_project/entity/mytask/tasks/mytask.py"
Then I run the following command to deploy
dbx deploy --deployment-file ./conf/dev/deployment.yml -e dev
It deploys fine. No errors!
But when I run the job, I get the following error
/local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/pip install --upgrade /local_disk0/tmp/abc/[REDACTED]-0.8.0-py3-none-any.whl --disable-pip-version-check) exited with code 1, and Processing /local_disk0/tmp/abc/[REDACTED]-0.8.0-py3-none-any.whl
24/01/16 12:13:43 INFO SharedDriverContext: Failed to attach library dbfs:/Shared/dbx/projects/[REDACTED]/abc/artifacts/dist/[REDACTED]-0.8.0-py3-none-any.whl to Spark
java.lang.Throwable: Process List(/bin/su, libraries, -c, bash /local_disk0/.ephemeral_nfs/cluster_libraries/python/python_start_clusterwide.sh /local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/pip install --upgrade /local_disk0/tmp/abc/[REDACTED]-0.8.0-py3-none-any.whl --disable-pip-version-check) exited with code 1. ERROR: Could not find a version that satisfies the requirement myprivatepackage (from [REDACTED]) (from versions: none)
ERROR: No matching distribution found for myprivatepackage
How do I resolve this?
07-03-2024 08:14 AM - edited 07-03-2024 08:33 AM
I added init script to compute in order to add details of private package login in /etc/pip.conf
Something as follows:
resource "databricks_workspace_file" "gitlab_pypi_init_script" {
provider = databricks.workspace
content_base64 = base64encode(<<-EOT
#!/bin/bash
if [[ $PYPI_TOKEN ]]; then
use $PYPI_TOKEN
fi
echo $PYPI_TOKEN
printf "[global]\n" > /etc/pip.conf
printf "extra-index-url =\n" >> /etc/pip.conf
printf "\thttps://__token__:$PYPI_TOKEN@gitlab.com/api/v4/projects/12345678/packages/pypi/simple\n" >> /etc/pip.conf
EOT
)
path = "/FileStore/gitlab_pypi_init_script.sh"
}
I added this file to Workspace and then referenced it under init scripts in cluster compute and it worked to install private package in the cluster when it starts
I also made sure gitlab token was accessible with variable PYPI_TOKEN using Spark environment variables
01-19-2024 09:38 PM
Hi, Does this look like a dependency error? All the dependencies are packed in the whl? Also, could you please confirm if all the limitations are satified? Refer: https://docs.databricks.com/en/compute/access-mode-limitations.html
07-03-2024 08:14 AM - edited 07-03-2024 08:33 AM
I added init script to compute in order to add details of private package login in /etc/pip.conf
Something as follows:
resource "databricks_workspace_file" "gitlab_pypi_init_script" {
provider = databricks.workspace
content_base64 = base64encode(<<-EOT
#!/bin/bash
if [[ $PYPI_TOKEN ]]; then
use $PYPI_TOKEN
fi
echo $PYPI_TOKEN
printf "[global]\n" > /etc/pip.conf
printf "extra-index-url =\n" >> /etc/pip.conf
printf "\thttps://__token__:$PYPI_TOKEN@gitlab.com/api/v4/projects/12345678/packages/pypi/simple\n" >> /etc/pip.conf
EOT
)
path = "/FileStore/gitlab_pypi_init_script.sh"
}
I added this file to Workspace and then referenced it under init scripts in cluster compute and it worked to install private package in the cluster when it starts
I also made sure gitlab token was accessible with variable PYPI_TOKEN using Spark environment variables
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group