yesterday
Hi,
After moving from Databricks runtime 17.1 to 17.2 suddenly my pkgutils walk_packages doesn't identify any packages within my repository anymore.
This is my example code:
import pkgutil
import os
packages = pkgutil.walk_packages([os.getcwd()])
print(list(packages))previously it found all my packages but since the update to 17.2 it doesn't work anymore.
yesterday
Hello @Maxrb , I did some digging on my end and I have some suggestions and/or hints to help you further troubleshoot your issue.
What youโre running into lines up with a few runtime-specific behaviors that changed around Databricks Runtime 17.x, and together they explain why package discovery suddenly went quiet after the move to 17.2.
What likely changed
First, the current working directory on Databricks is the directory of the running notebook or script, not necessarily your repo root. If your packages live somewhere elseโsay at the repo root or under a src folderโthen pkgutil.walk_packages([os.getcwd()]) will simply never see them. Itโs scanning the wrong place.
Second, when youโre importing Python code from workspace files or Git folders that live outside the notebookโs directory, you need to be explicit about sys.path. The root of a Git folder is automatically added, but subdirectories are not. And if youโre working with workspace files, the path you append must include the /Workspace/โฆ prefix. If Python canโt see the directory, pkgutil wonโt either.
Finally, across the 17.x line there were changes to Python import hooks that tightened up how workspace paths are handled. A related issue showed up in 17.3 with wheel tasks, but even in 17.2 the behavior is more strict and predictable. Code that implicitly relied on os.getcwd() pointing at the repo root can now fail if the notebook lives in a subfolder.
Quick sanity checks
Before changing anything, itโs worth confirming what Python thinks is going on:
Print the working directory and its contents:
print(os.getcwd())
print(os.listdir(os.getcwd()))
This tells you immediately whether youโre scanning a directory that actually contains your packages.
Also double-check that your packages include an init.py. pkgutil.walk_packages only discovers classic packages; it wonโt enumerate PEP 420 namespace packages.
Recommended fixes
Which fix you choose really depends on where your code lives.
Option 1: Point pkgutil directly at your repo code (my preferred approach)
If your packages live under something like /Workspace/Repos///src, be explicit. Add that directory to sys.path and walk it directly:
import os
import sys
import pkgutil
repo_root = "/Workspace/Repos/<user>/<repo>"
src_dir = os.path.join(repo_root, "src") # or repo_root if you donโt use src/
if src_dir not in sys.path:
sys.path.append(src_dir)
packages = list(pkgutil.walk_packages([src_dir]))
print(packages)
This removes all ambiguity about what youโre scanning and what Python can import.
Option 2: Let sys.path do the work
If your notebook lives at the Git folder root (not nested), that root is already on sys.path. In that case you can just let pkgutil walk everything Python already knows about:
import pkgutil
packages = list(pkgutil.walk_packages())
print(packages)
This only works if your layout is clean and flat, but when it applies, itโs the simplest solution.
Option 3: Compute the repo root from the notebook location
If your notebook is nested a few levels down, compute the repo root relative to the working directory and add it:
import os
import sys
import pkgutil
cwd = os.getcwd()
repo_root = os.path.dirname(os.path.dirname(cwd)) # adjust depth as needed
if repo_root not in sys.path:
sys.path.append(repo_root)
packages = list(pkgutil.walk_packages([repo_root]))
print(packages)
Why os.getcwd() started betraying you
In 17.x, Databricks is much more consistent about setting CWD to the notebookโs directory. If your code used to run from a location that happened to be the repo rootโand now runs from a subfolderโthen walk_packages([os.getcwd()]) will return nothing because itโs doing exactly what you asked: scanning the wrong directory.
That behavior lines up with the documented CWD semantics and the newer guidance around workspace files and Git folders. Nothing is โbrokenโ so much as more strictly defined.
Hope these tips get you over the finish line.
Cheers, Lou.
yesterday
Thanks for the detailed answer @Louis_Frolio ,
Unfortunately, none of this is working, I have a notebook in my repo root, I checked all the sys.path, the cwd and, did all the options you mentioned and still it doesn't work in dbr 17.2+.
Simply put I see all the folders there in listdir, but somehow it doesn't pick up any packages.
Do you not experience the same with local packages?
Cheers,
Max
yesterday
Hmmm, I have not personally experienced this. I dug a little deeper in our internal docs and and leveraged some internal tools to put togehter another approach for you. Please give this a try and let me know.
Youโre running into a subtle but very real behavior change in Databricks Runtime 17.2, and it shows up most clearly when using pkgutil.walk_packages() with the current working directory.
This isnโt your code suddenly โbreaking.โ Itโs the interaction between Pythonโs import system and how DBR 17.2 (now on Python 3.12) treats discovery paths.
Letโs walk through it.
pkgutil.walk_packages() doesnโt just crawl a filesystem path. It expects that path to behave like a real Python import location:
โข The directory must contain proper packages (__init__.py)
โข And just as importantly, the directory must be reachable through Pythonโs import machinery
In DBR 17.2, relying on os.getcwd() alone is no longer sufficient. Even if the files are there, Python wonโt reliably discover them unless that directory is also present on sys.path. Earlier runtimes were more forgiving; Python 3.12 is not.
Thatโs why walk_packages() suddenly appears to return nothing.
This aligns your filesystem view with Pythonโs import system and works consistently:
import pkgutil
import os
import sys
cwd = os.getcwd()
if cwd not in sys.path:
sys.path.insert(0, cwd)
packages = pkgutil.walk_packages([cwd])
print(list(packages))
This is the safest pattern and the one I recommend in most Databricks notebooks.
If your code lives in a repo or workspace folder, be explicit about where packages live instead of relying on the notebookโs working directory:
import pkgutil
import os
repo_path = os.path.abspath("/Workspace/path/to/your/repo")
packages = pkgutil.walk_packages([repo_path])
print(list(packages))
This avoids ambiguity entirely and plays nicely with Git folders and workspace imports.
Make sure your package structure is real Python, not just folders that โlookโ like packages.
Every directory you expect to be discovered must include an __init__.py. Python 3.12 is noticeably stricter here, and DBR 17.2 surfaces that reality.
DBR 17.2 includes a Python upgrade to 3.12.x, along with internal changes to import handling. pkgutil.walk_packages() has always required paths to be importableโbut earlier runtimes were more lenient when the current working directory happened to work by accident.
In short:
What used to work implicitly now needs to be explicit.
Thatโs not a regressionโitโs Python behaving the way it always documented itself.
Regards, Louis.
21 hours ago
Hi @Louis_Frolio ,
Unfortunately Whatever I am doing, add all paths etc, trying all your solutions it just simply doesn't work. When I run pkgutil on for instance a pyspark.sql packages __path__ it simply works. For me it looks like anything inside the workspace it doesn't find, while in dbr <17.2, all of these things were working. I don't see any files being discovered whatsoever, it just returns an empty array.
I'm a bit lost what could be happening here, I tried it inside a repo, a normal workspace with a folder but somehow no matter what I try it always return an empty list when the "package" is inside my workspace.
20 hours ago
I did a bit of deep dive into the source code of the pkgutils walk_package, and I noticed this happening:
def get_importer(path_item):
path_item = os.fsdecode(path_item)
try:
importer = sys.path_importer_cache[path_item]
except:
importer = []
return importerfor a given path in dbr <17.2 like `/Workspace/Repos/<user>/<repo>` this returns a normal FileFinder Object, when I try on >= 17.2 this returns <dbruntime.workspace_import_machinery._WorkspacePathEntryFinder object at 0x.....>.
looking further this means that it will never find any files and thus not work on imports within the repo.
18 hours ago
Hey @Maxrb ,
Just thinking out loud here, but this might be worth experimenting with.
You could try using a Unity Catalog Volume as a lightweight package repository. Volumes can act as a secure, governed home for Python wheels (and JARs), and Databricks explicitly supports installing libraries directly from volume paths onto clusters, notebooks, and jobs. In fact, for UC-enabled workspaces, volumes are the recommended pattern for this exact use case.
Just a thought.
Cheers, Lou.
17 hours ago
Hey @Louis_Frolio,
Thanks for thinking along, the whole idea is that this package is not installed as a jar, wheel or something else, but it's a living module in my repository. For production I don't think I will have this issue as I install my repo as a wheel file using Databricks asset bundles and I expect them to still be discovered using this pkgutil, but currently when developing in databricks it's breaking. Note that locally in vscode using databricks connect everything is still working fine.
I checked all the updates in dbr 17.2 and I couldn't find anything specifically related to this.
I don't have the capacity to investigate this any further currently, but I am doubting that the current behaviour is correct.
but again, thanks for thinking along!
Cheers,
Max
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now