<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Unable to install libraries from requirements.txt in a Serverless Job and spark_python_task in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/unable-to-install-libraries-from-requirements-txt-in-a/m-p/138183#M50881</link>
    <description>&lt;P&gt;Thank you&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/34815"&gt;@Louis_Frolio&lt;/a&gt;&amp;nbsp;! I used Pattern C and it resolved it for me.&lt;/P&gt;</description>
    <pubDate>Fri, 07 Nov 2025 20:59:08 GMT</pubDate>
    <dc:creator>aav331</dc:creator>
    <dc:date>2025-11-07T20:59:08Z</dc:date>
    <item>
      <title>Unable to install libraries from requirements.txt in a Serverless Job and spark_python_task</title>
      <link>https://community.databricks.com/t5/data-engineering/unable-to-install-libraries-from-requirements-txt-in-a/m-p/138040#M50849</link>
      <description>&lt;P&gt;I am running into the following error while trying to deploy a serverless job running a spark_python_task with GIT as the source for the code. The Job was deployed as part of a DAB from a Github Actions Runner.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;Run failed with error message
 Library installation failed: Library installation attempted on serverless compute and failed. The library file does not exist or the user does not have permission to read the library file. Please check if the library file exists and the user has the right permissions to access the file. Error code: ERROR_NO_SUCH_FILE_OR_DIRECTORY, error message: Notebook environment installation failed:
ERROR: Could not open requirements file: [Errno 2] No such file or directory: '/tmp/dbx_pipeline/search_model_infra/src/requirements.txt'&lt;/LI-CODE&gt;&lt;P&gt;This is my DAB definition&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;resources:
  jobs:
    Search_Infra_Setup_VS_Endpoint_and_Index:
      name: "[Search Infra] Setup VS Endpoint and Index"
      tasks:
        - task_key: run_setup_script
          spark_python_task:
            python_file: dbx_pipeline/search_model_infra/src/setup_vector_search.py
            parameters:
            source: GIT
          environment_key: default_python
      git_source:
        git_url: https://github.com/git_repo
        git_provider: gitHub
        git_branch: develop
      queue:
        enabled: true
      environments:
        - environment_key: default_python
          spec:
            dependencies:
              - -r dbx_pipeline/search_model_infra/src/requirements.txt
            environment_version: "4"&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 06 Nov 2025 21:01:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unable-to-install-libraries-from-requirements-txt-in-a/m-p/138040#M50849</guid>
      <dc:creator>aav331</dc:creator>
      <dc:date>2025-11-06T21:01:40Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to install libraries from requirements.txt in a Serverless Job and spark_python_task</title>
      <link>https://community.databricks.com/t5/data-engineering/unable-to-install-libraries-from-requirements-txt-in-a/m-p/138048#M50851</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/196898"&gt;@aav331&lt;/a&gt;&amp;nbsp;,&amp;nbsp; here’s a focused analysis of the community post’s issue and how to fix it.&lt;/P&gt;
&lt;DIV class="paragraph"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;H3 class="paragraph"&gt;Summary of the problem&lt;/H3&gt;
&lt;DIV class="paragraph"&gt;The job is a serverless spark_python_task sourced from Git, and it fails to install packages from a requirements.txt because the file isn’t found at runtime: “No such file or directory: '/tmp/dbx_pipeline/search_model_infra/src/requirements.txt'”.&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;H3 class="paragraph"&gt;Diagnosis Two things are at play:&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;You’re declaring the requirements file through the job’s &lt;STRONG&gt;environment spec&lt;/STRONG&gt; as a dependency with “-r path”, but Asset Bundles expect requirements files to be wired via the task’s &lt;STRONG&gt;libraries&lt;/STRONG&gt; section, not inside the environment spec.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;You are using &lt;STRONG&gt;source: GIT&lt;/STRONG&gt; for the task, which Databricks advises against for bundles, because relative paths may not resolve consistently and the deployed job may not have the same file layout as your local copy. Using WORKSPACE with bundle deploy ensures the files are present under /Workspace for runtime resolution.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;DIV class="paragraph"&gt;Also note that serverless Python/script tasks require an &lt;STRONG&gt;environment_key&lt;/STRONG&gt;, which you’ve set (good); but the examples use a libraries mapping for requirements files or wheels, rather than environment spec with “-r …”.&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;H3 class="paragraph"&gt;Likely root cause&lt;/H3&gt;
&lt;UL&gt;
&lt;LI class="paragraph"&gt;The serverless runtime can’t see your requirements file because it isn’t staged into the job’s working directory when sourcing code directly from Git, and the environment spec doesn’t stage files; it only installs packages. As a result, pip can’t open the path you reference (“/tmp/dbx_pipeline/…”).&lt;/LI&gt;
&lt;/UL&gt;
&lt;DIV class="paragraph"&gt;### Recommended fixes Pick one of these patterns (A is the most robust for DAB):&lt;/DIV&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;Pattern A — Use WORKSPACE and libraries.requirements:&lt;/DIV&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;Deploy the bundle so your repo assets (including requirements.txt) are synced to &lt;STRONG&gt;/Workspace/${workspace.file_path}&lt;/STRONG&gt;. Then reference the requirements file in the task’s libraries section:
&lt;UL&gt;
&lt;LI&gt;libraries:
&lt;UL&gt;
&lt;LI&gt;requirements: /Workspace/${workspace.file_path}/requirements.txt&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;This is the documented way to attach a requirements.txt to a job task; paths can be local, workspace, or UC volume, and the workspace path is recommended for serverless jobs deployed via bundles.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;Switch the task to &lt;STRONG&gt;source: WORKSPACE&lt;/STRONG&gt; (or omit source so WORKSPACE is used when git_source isn’t set), and deploy with the bundle to ensure the file exists at runtime.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;Pattern B — Use wheel(s) instead of requirements:&lt;/DIV&gt;
&lt;UL&gt;
&lt;LI&gt;Build a wheel in the bundle and install it via libraries.whl. This avoids per-run pip installs and is well supported in DAB examples.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;Pattern C — Keep Git source but stage the requirements file to a supported path:&lt;/DIV&gt;
&lt;UL&gt;
&lt;LI&gt;If you must use GIT, don’t rely on a repo-relative “-r …” in environment spec. Instead, upload the requirements.txt to &lt;STRONG&gt;Workspace Files&lt;/STRONG&gt; (or a UC volume) and reference that absolute path in the libraries.requirements mapping:
&lt;UL&gt;
&lt;LI&gt;libraries:
&lt;UL&gt;
&lt;LI&gt;requirements: /Workspace/Shared/&amp;lt;your-path&amp;gt;/requirements.txt&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3 class="paragraph"&gt;Minimal, corrected bundle snippet&lt;/H3&gt;
&lt;DIV class="paragraph"&gt;Using Pattern A (WORKSPACE + libraries.requirements) with serverless job:&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;&lt;CODE&gt;yaml
resources:
  jobs:
    search_infra_setup:
      name: "[Search Infra] Setup VS Endpoint and Index"
      tasks:
        - task_key: run_setup_script
          spark_python_task:
            python_file: ../src/setup_vector_search.py
            source: WORKSPACE
          environment_key: default_python
          libraries:
            - requirements: /Workspace/${workspace.file_path}/requirements.txt
      environments:
        - environment_key: default_python
          spec:
            environment_version: "4"
&lt;/CODE&gt; In your bundle, ensure the requirements.txt is included (for example via bundle include or workspace files), so it ends up under /Workspace/${workspace.file_path}/requirements.txt at deploy time.&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;H3 class="paragraph"&gt;Gotchas to check&lt;/H3&gt;
&lt;UL&gt;
&lt;LI class="paragraph"&gt;The &lt;STRONG&gt;libraries.requirements&lt;/STRONG&gt; path must be accessible to serverless (Workspace Files, UC Volume, or local path that exists after bundle deploy). Avoid ephemeral “/tmp/…” paths that aren’t guaranteed across runs.&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;For Asset Bundles, avoid &lt;STRONG&gt;source: GIT&lt;/STRONG&gt; because “local relative paths may not point to the same content in the Git repository”; use WORKSPACE sources deployed via bundles instead.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;For serverless Python/script tasks, keep &lt;STRONG&gt;environment_key&lt;/STRONG&gt; set; install packages via libraries (requirements or wheels), not via “-r …” inside environment.spec.dependencies.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;DIV class="paragraph"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;Hope this helps, Louis.&lt;/DIV&gt;</description>
      <pubDate>Thu, 06 Nov 2025 23:11:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unable-to-install-libraries-from-requirements-txt-in-a/m-p/138048#M50851</guid>
      <dc:creator>Louis_Frolio</dc:creator>
      <dc:date>2025-11-06T23:11:57Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to install libraries from requirements.txt in a Serverless Job and spark_python_task</title>
      <link>https://community.databricks.com/t5/data-engineering/unable-to-install-libraries-from-requirements-txt-in-a/m-p/138183#M50881</link>
      <description>&lt;P&gt;Thank you&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/34815"&gt;@Louis_Frolio&lt;/a&gt;&amp;nbsp;! I used Pattern C and it resolved it for me.&lt;/P&gt;</description>
      <pubDate>Fri, 07 Nov 2025 20:59:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unable-to-install-libraries-from-requirements-txt-in-a/m-p/138183#M50881</guid>
      <dc:creator>aav331</dc:creator>
      <dc:date>2025-11-07T20:59:08Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to install libraries from requirements.txt in a Serverless Job and spark_python_task</title>
      <link>https://community.databricks.com/t5/data-engineering/unable-to-install-libraries-from-requirements-txt-in-a/m-p/138233#M50887</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/196898"&gt;@aav331&lt;/a&gt;&amp;nbsp;, if you are happy with the result please "Accept as Solution." This will help others who may be in the same boat. Cheers, Louis.&lt;/P&gt;</description>
      <pubDate>Sat, 08 Nov 2025 22:18:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unable-to-install-libraries-from-requirements-txt-in-a/m-p/138233#M50887</guid>
      <dc:creator>Louis_Frolio</dc:creator>
      <dc:date>2025-11-08T22:18:57Z</dc:date>
    </item>
  </channel>
</rss>

