cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Unable to install libraries from requirements.txt in a Serverless Job and spark_python_task

aav331
New Contributor

I am running into the following error while trying to deploy a serverless job running a spark_python_task with GIT as the source for the code. The Job was deployed as part of a DAB from a Github Actions Runner.

Run failed with error message
 Library installation failed: Library installation attempted on serverless compute and failed. The library file does not exist or the user does not have permission to read the library file. Please check if the library file exists and the user has the right permissions to access the file. Error code: ERROR_NO_SUCH_FILE_OR_DIRECTORY, error message: Notebook environment installation failed:
ERROR: Could not open requirements file: [Errno 2] No such file or directory: '/tmp/dbx_pipeline/search_model_infra/src/requirements.txt'

This is my DAB definition

resources:
  jobs:
    Search_Infra_Setup_VS_Endpoint_and_Index:
      name: "[Search Infra] Setup VS Endpoint and Index"
      tasks:
        - task_key: run_setup_script
          spark_python_task:
            python_file: dbx_pipeline/search_model_infra/src/setup_vector_search.py
            parameters:
            source: GIT
          environment_key: default_python
      git_source:
        git_url: https://github.com/git_repo
        git_provider: gitHub
        git_branch: develop
      queue:
        enabled: true
      environments:
        - environment_key: default_python
          spec:
            dependencies:
              - -r dbx_pipeline/search_model_infra/src/requirements.txt
            environment_version: "4"

 

1 ACCEPTED SOLUTION

Accepted Solutions

Louis_Frolio
Databricks Employee
Databricks Employee

Hey @aav331 ,  here’s a focused analysis of the community post’s issue and how to fix it.

 

Summary of the problem

The job is a serverless spark_python_task sourced from Git, and it fails to install packages from a requirements.txt because the file isn’t found at runtime: “No such file or directory: '/tmp/dbx_pipeline/search_model_infra/src/requirements.txt'”.
 

Diagnosis Two things are at play:

  • You’re declaring the requirements file through the job’s environment spec as a dependency with “-r path”, but Asset Bundles expect requirements files to be wired via the task’s libraries section, not inside the environment spec.
  • You are using source: GIT for the task, which Databricks advises against for bundles, because relative paths may not resolve consistently and the deployed job may not have the same file layout as your local copy. Using WORKSPACE with bundle deploy ensures the files are present under /Workspace for runtime resolution.
Also note that serverless Python/script tasks require an environment_key, which you’ve set (good); but the examples use a libraries mapping for requirements files or wheels, rather than environment spec with “-r …”.
 

Likely root cause

  • The serverless runtime can’t see your requirements file because it isn’t staged into the job’s working directory when sourcing code directly from Git, and the environment spec doesn’t stage files; it only installs packages. As a result, pip can’t open the path you reference (“/tmp/dbx_pipeline/…”).
### Recommended fixes Pick one of these patterns (A is the most robust for DAB):
  • Pattern A — Use WORKSPACE and libraries.requirements:
    • Deploy the bundle so your repo assets (including requirements.txt) are synced to /Workspace/${workspace.file_path}. Then reference the requirements file in the task’s libraries section:
      • libraries:
        • requirements: /Workspace/${workspace.file_path}/requirements.txt
    • This is the documented way to attach a requirements.txt to a job task; paths can be local, workspace, or UC volume, and the workspace path is recommended for serverless jobs deployed via bundles.
    • Switch the task to source: WORKSPACE (or omit source so WORKSPACE is used when git_source isn’t set), and deploy with the bundle to ensure the file exists at runtime.
  • Pattern B — Use wheel(s) instead of requirements:
    • Build a wheel in the bundle and install it via libraries.whl. This avoids per-run pip installs and is well supported in DAB examples.
  • Pattern C — Keep Git source but stage the requirements file to a supported path:
    • If you must use GIT, don’t rely on a repo-relative “-r …” in environment spec. Instead, upload the requirements.txt to Workspace Files (or a UC volume) and reference that absolute path in the libraries.requirements mapping:
      • libraries:
        • requirements: /Workspace/Shared/<your-path>/requirements.txt

Minimal, corrected bundle snippet

Using Pattern A (WORKSPACE + libraries.requirements) with serverless job:
yaml resources: jobs: search_infra_setup: name: "[Search Infra] Setup VS Endpoint and Index" tasks: - task_key: run_setup_script spark_python_task: python_file: ../src/setup_vector_search.py source: WORKSPACE environment_key: default_python libraries: - requirements: /Workspace/${workspace.file_path}/requirements.txt environments: - environment_key: default_python spec: environment_version: "4" In your bundle, ensure the requirements.txt is included (for example via bundle include or workspace files), so it ends up under /Workspace/${workspace.file_path}/requirements.txt at deploy time.
 

Gotchas to check

  • The libraries.requirements path must be accessible to serverless (Workspace Files, UC Volume, or local path that exists after bundle deploy). Avoid ephemeral “/tmp/…” paths that aren’t guaranteed across runs.
  • For Asset Bundles, avoid source: GIT because “local relative paths may not point to the same content in the Git repository”; use WORKSPACE sources deployed via bundles instead.
  • For serverless Python/script tasks, keep environment_key set; install packages via libraries (requirements or wheels), not via “-r …” inside environment.spec.dependencies.
 
Hope this helps, Louis.

View solution in original post

3 REPLIES 3

Louis_Frolio
Databricks Employee
Databricks Employee

Hey @aav331 ,  here’s a focused analysis of the community post’s issue and how to fix it.

 

Summary of the problem

The job is a serverless spark_python_task sourced from Git, and it fails to install packages from a requirements.txt because the file isn’t found at runtime: “No such file or directory: '/tmp/dbx_pipeline/search_model_infra/src/requirements.txt'”.
 

Diagnosis Two things are at play:

  • You’re declaring the requirements file through the job’s environment spec as a dependency with “-r path”, but Asset Bundles expect requirements files to be wired via the task’s libraries section, not inside the environment spec.
  • You are using source: GIT for the task, which Databricks advises against for bundles, because relative paths may not resolve consistently and the deployed job may not have the same file layout as your local copy. Using WORKSPACE with bundle deploy ensures the files are present under /Workspace for runtime resolution.
Also note that serverless Python/script tasks require an environment_key, which you’ve set (good); but the examples use a libraries mapping for requirements files or wheels, rather than environment spec with “-r …”.
 

Likely root cause

  • The serverless runtime can’t see your requirements file because it isn’t staged into the job’s working directory when sourcing code directly from Git, and the environment spec doesn’t stage files; it only installs packages. As a result, pip can’t open the path you reference (“/tmp/dbx_pipeline/…”).
### Recommended fixes Pick one of these patterns (A is the most robust for DAB):
  • Pattern A — Use WORKSPACE and libraries.requirements:
    • Deploy the bundle so your repo assets (including requirements.txt) are synced to /Workspace/${workspace.file_path}. Then reference the requirements file in the task’s libraries section:
      • libraries:
        • requirements: /Workspace/${workspace.file_path}/requirements.txt
    • This is the documented way to attach a requirements.txt to a job task; paths can be local, workspace, or UC volume, and the workspace path is recommended for serverless jobs deployed via bundles.
    • Switch the task to source: WORKSPACE (or omit source so WORKSPACE is used when git_source isn’t set), and deploy with the bundle to ensure the file exists at runtime.
  • Pattern B — Use wheel(s) instead of requirements:
    • Build a wheel in the bundle and install it via libraries.whl. This avoids per-run pip installs and is well supported in DAB examples.
  • Pattern C — Keep Git source but stage the requirements file to a supported path:
    • If you must use GIT, don’t rely on a repo-relative “-r …” in environment spec. Instead, upload the requirements.txt to Workspace Files (or a UC volume) and reference that absolute path in the libraries.requirements mapping:
      • libraries:
        • requirements: /Workspace/Shared/<your-path>/requirements.txt

Minimal, corrected bundle snippet

Using Pattern A (WORKSPACE + libraries.requirements) with serverless job:
yaml resources: jobs: search_infra_setup: name: "[Search Infra] Setup VS Endpoint and Index" tasks: - task_key: run_setup_script spark_python_task: python_file: ../src/setup_vector_search.py source: WORKSPACE environment_key: default_python libraries: - requirements: /Workspace/${workspace.file_path}/requirements.txt environments: - environment_key: default_python spec: environment_version: "4" In your bundle, ensure the requirements.txt is included (for example via bundle include or workspace files), so it ends up under /Workspace/${workspace.file_path}/requirements.txt at deploy time.
 

Gotchas to check

  • The libraries.requirements path must be accessible to serverless (Workspace Files, UC Volume, or local path that exists after bundle deploy). Avoid ephemeral “/tmp/…” paths that aren’t guaranteed across runs.
  • For Asset Bundles, avoid source: GIT because “local relative paths may not point to the same content in the Git repository”; use WORKSPACE sources deployed via bundles instead.
  • For serverless Python/script tasks, keep environment_key set; install packages via libraries (requirements or wheels), not via “-r …” inside environment.spec.dependencies.
 
Hope this helps, Louis.

aav331
New Contributor

Thank you @Louis_Frolio ! I used Pattern C and it resolved it for me.

Louis_Frolio
Databricks Employee
Databricks Employee

@aav331 , if you are happy with the result please "Accept as Solution." This will help others who may be in the same boat. Cheers, Louis.