Databricks Community

aav331 · Thursday

I am running into the following error while trying to deploy a serverless job running a spark_python_task with GIT as the source for the code. The Job was deployed as part of a DAB from a Github Actions Runner.

Run failed with error message
 Library installation failed: Library installation attempted on serverless compute and failed. The library file does not exist or the user does not have permission to read the library file. Please check if the library file exists and the user has the right permissions to access the file. Error code: ERROR_NO_SUCH_FILE_OR_DIRECTORY, error message: Notebook environment installation failed:
ERROR: Could not open requirements file: [Errno 2] No such file or directory: '/tmp/dbx_pipeline/search_model_infra/src/requirements.txt'

This is my DAB definition

resources:
  jobs:
    Search_Infra_Setup_VS_Endpoint_and_Index:
      name: "[Search Infra] Setup VS Endpoint and Index"
      tasks:
        - task_key: run_setup_script
          spark_python_task:
            python_file: dbx_pipeline/search_model_infra/src/setup_vector_search.py
            parameters:
            source: GIT
          environment_key: default_python
      git_source:
        git_url: https://github.com/git_repo
        git_provider: gitHub
        git_branch: develop
      queue:
        enabled: true
      environments:
        - environment_key: default_python
          spec:
            dependencies:
              - -r dbx_pipeline/search_model_infra/src/requirements.txt
            environment_version: "4"

Louis_Frolio · Thursday

Hey @aav331 , here’s a focused analysis of the community post’s issue and how to fix it.

Summary of the problem

The job is a serverless spark_python_task sourced from Git, and it fails to install packages from a requirements.txt because the file isn’t found at runtime: “No such file or directory: '/tmp/dbx_pipeline/search_model_infra/src/requirements.txt'”.

Diagnosis Two things are at play:

You’re declaring the requirements file through the job’s environment spec as a dependency with “-r path”, but Asset Bundles expect requirements files to be wired via the task’s libraries section, not inside the environment spec.
You are using source: GIT for the task, which Databricks advises against for bundles, because relative paths may not resolve consistently and the deployed job may not have the same file layout as your local copy. Using WORKSPACE with bundle deploy ensures the files are present under /Workspace for runtime resolution.

Also note that serverless Python/script tasks require an environment_key, which you’ve set (good); but the examples use a libraries mapping for requirements files or wheels, rather than environment spec with “-r …”.

Likely root cause

The serverless runtime can’t see your requirements file because it isn’t staged into the job’s working directory when sourcing code directly from Git, and the environment spec doesn’t stage files; it only installs packages. As a result, pip can’t open the path you reference (“/tmp/dbx_pipeline/…”).

### Recommended fixes Pick one of these patterns (A is the most robust for DAB):

Pattern A — Use WORKSPACE and libraries.requirements:
- Deploy the bundle so your repo assets (including requirements.txt) are synced to /Workspace/${workspace.file_path}. Then reference the requirements file in the task’s libraries section:
  - libraries:
    - requirements: /Workspace/${workspace.file_path}/requirements.txt
- This is the documented way to attach a requirements.txt to a job task; paths can be local, workspace, or UC volume, and the workspace path is recommended for serverless jobs deployed via bundles.
- Switch the task to source: WORKSPACE (or omit source so WORKSPACE is used when git_source isn’t set), and deploy with the bundle to ensure the file exists at runtime.
Pattern B — Use wheel(s) instead of requirements:
- Build a wheel in the bundle and install it via libraries.whl. This avoids per-run pip installs and is well supported in DAB examples.
Pattern C — Keep Git source but stage the requirements file to a supported path:
- If you must use GIT, don’t rely on a repo-relative “-r …” in environment spec. Instead, upload the requirements.txt to Workspace Files (or a UC volume) and reference that absolute path in the libraries.requirements mapping:
  - libraries:
    - requirements: /Workspace/Shared/<your-path>/requirements.txt

Minimal, corrected bundle snippet

Using Pattern A (WORKSPACE + libraries.requirements) with serverless job:

yaml
resources:
  jobs:
    search_infra_setup:
      name: "[Search Infra] Setup VS Endpoint and Index"
      tasks:
        - task_key: run_setup_script
          spark_python_task:
            python_file: ../src/setup_vector_search.py
            source: WORKSPACE
          environment_key: default_python
          libraries:
            - requirements: /Workspace/${workspace.file_path}/requirements.txt
      environments:
        - environment_key: default_python
          spec:
            environment_version: "4"

In your bundle, ensure the requirements.txt is included (for example via bundle include or workspace files), so it ends up under /Workspace/${workspace.file_path}/requirements.txt at deploy time.

Gotchas to check

The libraries.requirements path must be accessible to serverless (Workspace Files, UC Volume, or local path that exists after bundle deploy). Avoid ephemeral “/tmp/…” paths that aren’t guaranteed across runs.
For Asset Bundles, avoid source: GIT because “local relative paths may not point to the same content in the Git repository”; use WORKSPACE sources deployed via bundles instead.
For serverless Python/script tasks, keep environment_key set; install packages via libraries (requirements or wheels), not via “-r …” inside environment.spec.dependencies.

Hope this helps, Louis.

View solution in original post

Louis_Frolio · Thursday