cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to import python modules in a notebook?

bi123
New Contributor

I have a job with a notebook task that utilizes python modules in another folder than the notebook itself. When I try to import the module in the notebook, it raises module not found error. I solved the problem using sys.pathimage.png

But I am curious if there is a more elegant approach to handle this, maybe to set a root path for the notebook task in the yaml configuration?

3 REPLIES 3

szymon_dybczak
Esteemed Contributor III

Hi  ,

If you don't use declarative pipelines but regular notebook task then your approach is correct and aligned with what databricks recommends in their documentation:

 

Here are some useful links to docs:

Work with Python and R modules | Databricks on AWS

Import Python modules from Git folders or workspace files | Databricks on AWS

lingareddy_Alva
Esteemed Contributor

HI @bi123 

Good question. Your sys.path approach works, but yes โ€” there are cleaner alternatives depending on your setup.

The Most Elegant: PYTHONPATH in the job YAML
If you're using Databricks Asset Bundles (DAB), you can set the Python path at the job or task level directly in databricks.yml:

resources:
  jobs:
    my_job:
      tasks:
        - task_key: my_notebook_task
          notebook_task:
            notebook_path: ./notebooks/my_notebook
          libraries:
            - pypi:
                package: "."   # installs local package if you have setup.py/pyproject.toml

Or more directly, use job_cluster_key with a spark env variable:

yaml
new_cluster:
  spark_env_vars:
    PYTHONPATH: "/Workspace/path/to/src:/Workspace/path/to/shared"


This sets PYTHONPATH at the cluster level so every task on that cluster can resolve your modules without any sys.path manipulation in the notebook itself.


Even Cleaner: Package Your Utilities:

If src/ and shared/ are stable internal libraries, the proper solution is to give them a setup.py or pyproject.toml and install them as packages: 

yaml
libraries:
  - whl: ./dist/my_utils-0.1.0-py3-none-any.whl

DAB can build and deploy the wheel automatically. Then your notebook just does import my_utils with no path hacks at all.

In short: PYTHONPATH via spark_env_vars in the YAML is the most practical upgrade from what you have, and packaging is the right long-term answer if these utilities are shared across multiple jobs.

 

LR

shazi
New Contributor III

While this is a classic way to solve this, it can sometimes be "brittle" if your folder structure changes or if you share the notebook with others who have different file paths. In modern notebook environments.