Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-06-2025 03:11 PM
Hey @aav331 , here’s a focused analysis of the community post’s issue and how to fix it.
Summary of the problem
The job is a serverless spark_python_task sourced from Git, and it fails to install packages from a requirements.txt because the file isn’t found at runtime: “No such file or directory: '/tmp/dbx_pipeline/search_model_infra/src/requirements.txt'”.
Diagnosis Two things are at play:
-
You’re declaring the requirements file through the job’s environment spec as a dependency with “-r path”, but Asset Bundles expect requirements files to be wired via the task’s libraries section, not inside the environment spec.
-
You are using source: GIT for the task, which Databricks advises against for bundles, because relative paths may not resolve consistently and the deployed job may not have the same file layout as your local copy. Using WORKSPACE with bundle deploy ensures the files are present under /Workspace for runtime resolution.
Also note that serverless Python/script tasks require an environment_key, which you’ve set (good); but the examples use a libraries mapping for requirements files or wheels, rather than environment spec with “-r …”.
Likely root cause
- The serverless runtime can’t see your requirements file because it isn’t staged into the job’s working directory when sourcing code directly from Git, and the environment spec doesn’t stage files; it only installs packages. As a result, pip can’t open the path you reference (“/tmp/dbx_pipeline/…”).
### Recommended fixes Pick one of these patterns (A is the most robust for DAB):
-
Pattern A — Use WORKSPACE and libraries.requirements:
-
Deploy the bundle so your repo assets (including requirements.txt) are synced to /Workspace/${workspace.file_path}. Then reference the requirements file in the task’s libraries section:
- libraries:
- requirements: /Workspace/${workspace.file_path}/requirements.txt
- libraries:
-
This is the documented way to attach a requirements.txt to a job task; paths can be local, workspace, or UC volume, and the workspace path is recommended for serverless jobs deployed via bundles.
-
Switch the task to source: WORKSPACE (or omit source so WORKSPACE is used when git_source isn’t set), and deploy with the bundle to ensure the file exists at runtime.
-
-
Pattern B — Use wheel(s) instead of requirements:
- Build a wheel in the bundle and install it via libraries.whl. This avoids per-run pip installs and is well supported in DAB examples.
-
Pattern C — Keep Git source but stage the requirements file to a supported path:
- If you must use GIT, don’t rely on a repo-relative “-r …” in environment spec. Instead, upload the requirements.txt to Workspace Files (or a UC volume) and reference that absolute path in the libraries.requirements mapping:
- libraries:
- requirements: /Workspace/Shared/<your-path>/requirements.txt
- libraries:
- If you must use GIT, don’t rely on a repo-relative “-r …” in environment spec. Instead, upload the requirements.txt to Workspace Files (or a UC volume) and reference that absolute path in the libraries.requirements mapping:
Minimal, corrected bundle snippet
Using Pattern A (WORKSPACE + libraries.requirements) with serverless job:
yaml
resources:
jobs:
search_infra_setup:
name: "[Search Infra] Setup VS Endpoint and Index"
tasks:
- task_key: run_setup_script
spark_python_task:
python_file: ../src/setup_vector_search.py
source: WORKSPACE
environment_key: default_python
libraries:
- requirements: /Workspace/${workspace.file_path}/requirements.txt
environments:
- environment_key: default_python
spec:
environment_version: "4"
In your bundle, ensure the requirements.txt is included (for example via bundle include or workspace files), so it ends up under /Workspace/${workspace.file_path}/requirements.txt at deploy time.Gotchas to check
- The libraries.requirements path must be accessible to serverless (Workspace Files, UC Volume, or local path that exists after bundle deploy). Avoid ephemeral “/tmp/…” paths that aren’t guaranteed across runs.
-
For Asset Bundles, avoid source: GIT because “local relative paths may not point to the same content in the Git repository”; use WORKSPACE sources deployed via bundles instead.
-
For serverless Python/script tasks, keep environment_key set; install packages via libraries (requirements or wheels), not via “-r …” inside environment.spec.dependencies.
Hope this helps, Louis.