cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Deploy python application with submodules - Poetry library management

57410
New Contributor

Hi,

I'm using DBX (I'll soon move to Databricks Asset Bundle, but it doesn't change anything in my situation) to deploy a Python application to Databricks. I'm also using Poetry to manage my libraries and dependencies.

My project looks like this :

Project A
├── Folders A
├── main.py
├── pyproject.toml
└── ~Project B
    ├── Folders B
    ├── main.py
    └── pyproject.toml

Project B is a submodule with its own libraries and dependencies. In order to avoid double import or to manage some libraries in Project A that are only used on Project B, I import the Project B into Project A's 'pyproject.toml' file.

Project A's toml file :

[tool.poetry.dependencies]
python = "^3.10"
dbx = "^0.8.15"
project_b = { path = "./project_b", develop = true }

By doing so, the poetry.lock file from Project A includes the defined libraries in the current pyproject.toml + all the missing ones I could need from Project B.

In order to deploy my code to Databricks, DBX builds a wheel file with the following METADATA information :

Metadata-Version: 2.1
Name: project_a
Version: 1.0.0
Summary: some desc
Author: me
Author-email: me@me.com
Requires-Python: >=3.10,<4.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Dist: dbx (>=0.8.15,<0.9.0)
Requires-Dist: project_b @ file:///project_b

We can see "dbx" and "project_b" as defined in the pyproject.toml file from Project A.

It fails then on Databricks when I try to run my job (that is deployed with DBX using a deployment.yml file) with the following error message :

24/04/08 14:27:18 WARN LibraryState: [Thread 168] Failed to install library dbfs:/FileStore/my_location/4a4d6b50a44742d9be58fc544f272fd0/artifacts/dist/project_a-1.0.0-py3-none-any.whl
org.apache.spark.SparkException: Process List(/bin/su, libraries, -c, bash /local_disk0/.ephemeral_nfs/cluster_libraries/python/python_start_clusterwide.sh /local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/pip install --upgrade /local_disk0/tmp/addedFile556d8bb0a3244f44834308af3f689c807372305198373422453/project_a-1.0.0-py3-none-any.whl --disable-pip-version-check) exited with code 1. ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: '/project_b'

My assumption is Databricks doesn't know how to dynamically get the full path of the wheel file. In this case, it should be "dbfs:/FileStore/my_location/4a4d6b50a44742d9be58fc544f272fd0/artifacts/project_b".

DBX allows me to use a relative path when I deploy a job, like this :

environments:
  local:
    workflows:
      - name: dbx_execute_job
        spark_python_task:
          python_file: file://main.py
          parameters:
            - '--config'
            - 'file:fuse://conf/jobs/config.yaml'

Where "file://main.py" will point to "dbfs:/FileStore/my_location/4a4d6b50a44742d9be58fc544f272fd0/artifacts/main.py"
There's also a significant difference between the path I give on the deployment.yml file (with file://, only 2 slashes) and how poetry deals with it (file:///project_b, with 3 slashes).

I don't know if what I'm trying to do is achievable, but in the end I would like to be able to deploy a python application, with a submodule in it, without listing all the libraries from Project B on Project A's pyproject.toml file.

I would appreciate any help !

Thank you

 

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group