<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Deploy python application with submodules - Poetry library management in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/deploy-python-application-with-submodules-poetry-library/m-p/65818#M32934</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I'm using DBX (I'll soon move to Databricks Asset Bundle, but it doesn't change anything in my situation) to deploy a Python application to Databricks. I'm also using Poetry to manage my libraries and dependencies.&lt;/P&gt;&lt;P&gt;My project looks like this :&lt;/P&gt;&lt;LI-CODE lang="python"&gt;Project A
├── Folders A
├── main.py
├── pyproject.toml
└── ~Project B
    ├── Folders B
    ├── main.py
    └── pyproject.toml&lt;/LI-CODE&gt;&lt;P&gt;Project B is a submodule with its own libraries and dependencies. In order to avoid double import or to manage some libraries in Project A that are only used on Project B, I import the Project B into Project A's 'pyproject.toml' file.&lt;/P&gt;&lt;P&gt;Project A's toml file :&lt;/P&gt;&lt;LI-CODE lang="python"&gt;[tool.poetry.dependencies]
python = "^3.10"
dbx = "^0.8.15"
project_b = { path = "./project_b", develop = true }&lt;/LI-CODE&gt;&lt;P&gt;By doing so, the poetry.lock file from Project A includes the defined libraries in the current pyproject.toml + all the missing ones I could need from Project B.&lt;/P&gt;&lt;P&gt;In order to deploy my code to Databricks, DBX builds a wheel file with the following METADATA information :&lt;/P&gt;&lt;LI-CODE lang="python"&gt;Metadata-Version: 2.1
Name: project_a
Version: 1.0.0
Summary: some desc
Author: me
Author-email: me@me.com
Requires-Python: &amp;gt;=3.10,&amp;lt;4.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Dist: dbx (&amp;gt;=0.8.15,&amp;lt;0.9.0)
Requires-Dist: project_b @ file:///project_b&lt;/LI-CODE&gt;&lt;P&gt;We can see "dbx" and "project_b" as defined in the pyproject.toml file from Project A.&lt;/P&gt;&lt;P&gt;It fails then on Databricks when I try to run my job (that is deployed with DBX using a deployment.yml file) with the following error message :&lt;/P&gt;&lt;LI-CODE lang="python"&gt;24/04/08 14:27:18 WARN LibraryState: [Thread 168] Failed to install library dbfs:/FileStore/my_location/4a4d6b50a44742d9be58fc544f272fd0/artifacts/dist/project_a-1.0.0-py3-none-any.whl
org.apache.spark.SparkException: Process List(/bin/su, libraries, -c, bash /local_disk0/.ephemeral_nfs/cluster_libraries/python/python_start_clusterwide.sh /local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/pip install --upgrade /local_disk0/tmp/addedFile556d8bb0a3244f44834308af3f689c807372305198373422453/project_a-1.0.0-py3-none-any.whl --disable-pip-version-check) exited with code 1. ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: '/project_b'&lt;/LI-CODE&gt;&lt;P&gt;My assumption is Databricks doesn't know how to dynamically get the full path of the wheel file. In this case, it should be "dbfs:/FileStore/my_location/4a4d6b50a44742d9be58fc544f272fd0/artifacts/project_b".&lt;/P&gt;&lt;P&gt;DBX allows me to use a relative path when I deploy a job, like this :&lt;/P&gt;&lt;LI-CODE lang="python"&gt;environments:
  local:
    workflows:
      - name: dbx_execute_job
        spark_python_task:
          python_file: file://main.py
          parameters:
            - '--config'
            - 'file:fuse://conf/jobs/config.yaml'&lt;/LI-CODE&gt;&lt;P&gt;Where "file://main.py" will point to "dbfs:/FileStore/my_location/4a4d6b50a44742d9be58fc544f272fd0/artifacts/main.py"&lt;BR /&gt;There's also a significant difference between the path I give on the deployment.yml file (with file://, only 2 slashes) and how poetry deals with it (file:///project_b,&amp;nbsp;with 3 slashes).&lt;/P&gt;&lt;P&gt;I don't know if what I'm trying to do is achievable, but in the end I would like to be able to deploy a python application, with a submodule in it, without listing all the libraries from Project B on Project A's pyproject.toml file.&lt;/P&gt;&lt;P&gt;I would appreciate any help !&lt;/P&gt;&lt;P&gt;Thank you&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 08 Apr 2024 15:54:31 GMT</pubDate>
    <dc:creator>57410</dc:creator>
    <dc:date>2024-04-08T15:54:31Z</dc:date>
    <item>
      <title>Deploy python application with submodules - Poetry library management</title>
      <link>https://community.databricks.com/t5/data-engineering/deploy-python-application-with-submodules-poetry-library/m-p/65818#M32934</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I'm using DBX (I'll soon move to Databricks Asset Bundle, but it doesn't change anything in my situation) to deploy a Python application to Databricks. I'm also using Poetry to manage my libraries and dependencies.&lt;/P&gt;&lt;P&gt;My project looks like this :&lt;/P&gt;&lt;LI-CODE lang="python"&gt;Project A
├── Folders A
├── main.py
├── pyproject.toml
└── ~Project B
    ├── Folders B
    ├── main.py
    └── pyproject.toml&lt;/LI-CODE&gt;&lt;P&gt;Project B is a submodule with its own libraries and dependencies. In order to avoid double import or to manage some libraries in Project A that are only used on Project B, I import the Project B into Project A's 'pyproject.toml' file.&lt;/P&gt;&lt;P&gt;Project A's toml file :&lt;/P&gt;&lt;LI-CODE lang="python"&gt;[tool.poetry.dependencies]
python = "^3.10"
dbx = "^0.8.15"
project_b = { path = "./project_b", develop = true }&lt;/LI-CODE&gt;&lt;P&gt;By doing so, the poetry.lock file from Project A includes the defined libraries in the current pyproject.toml + all the missing ones I could need from Project B.&lt;/P&gt;&lt;P&gt;In order to deploy my code to Databricks, DBX builds a wheel file with the following METADATA information :&lt;/P&gt;&lt;LI-CODE lang="python"&gt;Metadata-Version: 2.1
Name: project_a
Version: 1.0.0
Summary: some desc
Author: me
Author-email: me@me.com
Requires-Python: &amp;gt;=3.10,&amp;lt;4.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Dist: dbx (&amp;gt;=0.8.15,&amp;lt;0.9.0)
Requires-Dist: project_b @ file:///project_b&lt;/LI-CODE&gt;&lt;P&gt;We can see "dbx" and "project_b" as defined in the pyproject.toml file from Project A.&lt;/P&gt;&lt;P&gt;It fails then on Databricks when I try to run my job (that is deployed with DBX using a deployment.yml file) with the following error message :&lt;/P&gt;&lt;LI-CODE lang="python"&gt;24/04/08 14:27:18 WARN LibraryState: [Thread 168] Failed to install library dbfs:/FileStore/my_location/4a4d6b50a44742d9be58fc544f272fd0/artifacts/dist/project_a-1.0.0-py3-none-any.whl
org.apache.spark.SparkException: Process List(/bin/su, libraries, -c, bash /local_disk0/.ephemeral_nfs/cluster_libraries/python/python_start_clusterwide.sh /local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/pip install --upgrade /local_disk0/tmp/addedFile556d8bb0a3244f44834308af3f689c807372305198373422453/project_a-1.0.0-py3-none-any.whl --disable-pip-version-check) exited with code 1. ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: '/project_b'&lt;/LI-CODE&gt;&lt;P&gt;My assumption is Databricks doesn't know how to dynamically get the full path of the wheel file. In this case, it should be "dbfs:/FileStore/my_location/4a4d6b50a44742d9be58fc544f272fd0/artifacts/project_b".&lt;/P&gt;&lt;P&gt;DBX allows me to use a relative path when I deploy a job, like this :&lt;/P&gt;&lt;LI-CODE lang="python"&gt;environments:
  local:
    workflows:
      - name: dbx_execute_job
        spark_python_task:
          python_file: file://main.py
          parameters:
            - '--config'
            - 'file:fuse://conf/jobs/config.yaml'&lt;/LI-CODE&gt;&lt;P&gt;Where "file://main.py" will point to "dbfs:/FileStore/my_location/4a4d6b50a44742d9be58fc544f272fd0/artifacts/main.py"&lt;BR /&gt;There's also a significant difference between the path I give on the deployment.yml file (with file://, only 2 slashes) and how poetry deals with it (file:///project_b,&amp;nbsp;with 3 slashes).&lt;/P&gt;&lt;P&gt;I don't know if what I'm trying to do is achievable, but in the end I would like to be able to deploy a python application, with a submodule in it, without listing all the libraries from Project B on Project A's pyproject.toml file.&lt;/P&gt;&lt;P&gt;I would appreciate any help !&lt;/P&gt;&lt;P&gt;Thank you&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 08 Apr 2024 15:54:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/deploy-python-application-with-submodules-poetry-library/m-p/65818#M32934</guid>
      <dc:creator>57410</dc:creator>
      <dc:date>2024-04-08T15:54:31Z</dc:date>
    </item>
  </channel>
</rss>

