Dear Databricks Community,
Couple months ago we migrated our pipelines importing dependencies using %run command on notebooks, to importing python (.py) modules adding Workspace root of repo/directory to sys.path. This solution worked for couple months till recently when the modules in Git Folders started failing on import attempt.
We observed new behaviour for All-Purpose Cluster in GitFolder
- The Workspace root path of the GitFolder is now added to the sys.path by defualt
This configuration however still works in Workspace directory where we deploy our code (separate compute) and it also works for Git Folder on Serverless. We're using All-Purpose Dedicated mode clusters both for scheduled jobs and development.
We reproduced this failure with various clusters
Source | Compute | Test |
Git Folder | Serverless | Ok |
Workspace Directory | Serverless | Ok |
Git Folder | All-Purpose cluster | Fails |
Workspace Directory | All-Purpose cluster | Ok |
ModuleNotFoundError: No module named 'Libraries'
--------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) File <command-4896020584242793>, line 1 ----> 1 from Libraries.configuration_module import get_global_configuration 3 global_config = get_global_configuration() 4 environment = global_config["environment_code"] File /databricks/python_shell/dbruntime/autoreload/discoverability/hook.py:72, in AutoreloadDiscoverabilityHook.pre_run_cell.<locals>.patched_import(name, *args, **kwargs) 66 if not self._should_hint and ( 67 (module := sys.modules.get(absolute_name)) is not None and 68 (fname := get_allowed_file_name_or_none(module)) is not None and 69 (mtime := os.stat(fname).st_mtime) > self.last_mtime_by_modname.get( 70 absolute_name, float("inf")) and not self._should_hint): 71 self._should_hint = True ---> 72 module = self._original_builtins_import(name, *args, **kwargs) 73 if (fname := fname or get_allowed_file_name_or_none(module)) is not None: 74 mtime = mtime or os.stat(fname).st_mtime ModuleNotFoundError: No module named 'Libraries'
Repo structure:
--Libraries/
--NotebooksDirectory/tests/
Import format for notebook located in NotebooksDirectory/tests/:
from Libraries.configuration_module import get_global_configuration
1. What's the recommended way to resolve this problem?
2. Has there been any changes during last two months in Git Folder structure mapping implementation?
3. Is there an available method that allows for importing python workspace files modules from notebooks based in nested structure of repo like ours??
Thanks for your help!