HI @bi123
Good question. Your sys.path approach works, but yes โ there are cleaner alternatives depending on your setup.
The Most Elegant: PYTHONPATH in the job YAML
If you're using Databricks Asset Bundles (DAB), you can set the Python path at the job or task level directly in databricks.yml:
resources:
jobs:
my_job:
tasks:
- task_key: my_notebook_task
notebook_task:
notebook_path: ./notebooks/my_notebook
libraries:
- pypi:
package: "." # installs local package if you have setup.py/pyproject.toml
Or more directly, use job_cluster_key with a spark env variable:
yaml
new_cluster:
spark_env_vars:
PYTHONPATH: "/Workspace/path/to/src:/Workspace/path/to/shared"
This sets PYTHONPATH at the cluster level so every task on that cluster can resolve your modules without any sys.path manipulation in the notebook itself.
Even Cleaner: Package Your Utilities:
If src/ and shared/ are stable internal libraries, the proper solution is to give them a setup.py or pyproject.toml and install them as packages:
yaml
libraries:
- whl: ./dist/my_utils-0.1.0-py3-none-any.whl
DAB can build and deploy the wheel automatically. Then your notebook just does import my_utils with no path hacks at all.
In short: PYTHONPATH via spark_env_vars in the YAML is the most practical upgrade from what you have, and packaging is the right long-term answer if these utilities are shared across multiple jobs.
LR