cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Inconsistent PYTHONPATH, Git folders vs DAB

mydefaultlogin
New Contributor

Hello Databricks Community,

I'm encountering an issue related to Python paths when working with notebooks in Databricks. I have a following structure in my project:

my_notebooks
  - my_notebook.py
/my_package
  - __init__.py
  - hello.py
databricks.yml

my_notebook.py

# Databricks notebook source
import sys
print(*sys.path, sep='\n')

# COMMAND ----------

from my_package.hello import hello_world
hello_world()

hello.py:

def hello_world():
    print('Hello World')

 databricks.yml

bundle:
  name: my_bundle
targets:
  dev:
    mode: production
    workspace:
      host: https://adb-XXXXXXXXXXXXXXXXXXXXX.azuredatabricks.net
      root_path: /dev/${bundle.name}

 I would like to work with the notebooks in two ways:
- I want all developers to work in their own Git folders
- then, once their work is done, and a pull request is completed, a pipeline would call `databricks bundle deploy` command to deploy the code to the `dev` environment.

The problem is that when you execute a notebook from someone's Git folder it works fine because Python easily finds the `my_package` package. When you try to execute `my_notebook` from the location it is deployed to by DAB (`/Workspace/dev/my_bundle/files/my_notebooks/my_notebook`), it does not work because `my_package` cannot be imported.

What is the reason for this inconsistency?
I would like to be able to import python packages from the root of my project, and the fact that the python path behaviour is different in git folders and DAB.

1 REPLY 1

Brahmareddy
Honored Contributor II

Hi mydefaultlogin,

How are you doing today?, As per my understanding, You're rightโ€”this happens because when you're running notebooks from your Git folder, Python knows exactly where your project root is and can easily find my_package. But when you deploy using Databricks Asset Bundles (DAB), the notebook runs from a different path (like /Workspace/dev/...), and Python no longer sees your project root by default, so it can't find the package. A simple fix is to manually add your project root to Python's path at the top of your notebook using sys.path.append('/Workspace/dev/my_bundle/files'). This helps Python locate your package just like it does in the Git setup. Itโ€™s a common issue with DAB deployments, and this quick adjustment should solve it. Let me know if youโ€™d like help making it reusable!

Regards,

Brahma

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now