Hello Databricks Community,
I'm encountering an issue related to Python paths when working with notebooks in Databricks. I have a following structure in my project:
my_notebooks
- my_notebook.py
/my_package
- __init__.py
- hello.py
databricks.yml
my_notebook.py
# Databricks notebook source
import sys
print(*sys.path, sep='\n')
# COMMAND ----------
from my_package.hello import hello_world
hello_world()
hello.py:
def hello_world():
print('Hello World')
databricks.yml
bundle:
name: my_bundle
targets:
dev:
mode: production
workspace:
host: https://adb-XXXXXXXXXXXXXXXXXXXXX.azuredatabricks.net
root_path: /dev/${bundle.name}
I would like to work with the notebooks in two ways:
- I want all developers to work in their own Git folders
- then, once their work is done, and a pull request is completed, a pipeline would call `databricks bundle deploy` command to deploy the code to the `dev` environment.
The problem is that when you execute a notebook from someone's Git folder it works fine because Python easily finds the `my_package` package. When you try to execute `my_notebook` from the location it is deployed to by DAB (`/Workspace/dev/my_bundle/files/my_notebooks/my_notebook`), it does not work because `my_package` cannot be imported.
What is the reason for this inconsistency?
I would like to be able to import python packages from the root of my project, and the fact that the python path behaviour is different in git folders and DAB.