cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

pkgutils walk_packages stopped working in DBR 17.2

Maxrb
Visitor

Hi,

After moving from Databricks runtime 17.1 to 17.2 suddenly my pkgutils walk_packages doesn't identify any packages within my repository anymore.

This is my example code:

import pkgutil
import os

packages = pkgutil.walk_packages([os.getcwd()])
print(list(packages))

 previously it found all my packages but since the update to 17.2 it doesn't work anymore.

3 REPLIES 3

Louis_Frolio
Databricks Employee
Databricks Employee

Hello @Maxrb , I did some digging on my end and I have some suggestions and/or hints to help you further troubleshoot your issue.

What youโ€™re running into lines up with a few runtime-specific behaviors that changed around Databricks Runtime 17.x, and together they explain why package discovery suddenly went quiet after the move to 17.2.

What likely changed

First, the current working directory on Databricks is the directory of the running notebook or script, not necessarily your repo root. If your packages live somewhere elseโ€”say at the repo root or under a src folderโ€”then pkgutil.walk_packages([os.getcwd()]) will simply never see them. Itโ€™s scanning the wrong place.

Second, when youโ€™re importing Python code from workspace files or Git folders that live outside the notebookโ€™s directory, you need to be explicit about sys.path. The root of a Git folder is automatically added, but subdirectories are not. And if youโ€™re working with workspace files, the path you append must include the /Workspace/โ€ฆ prefix. If Python canโ€™t see the directory, pkgutil wonโ€™t either.

Finally, across the 17.x line there were changes to Python import hooks that tightened up how workspace paths are handled. A related issue showed up in 17.3 with wheel tasks, but even in 17.2 the behavior is more strict and predictable. Code that implicitly relied on os.getcwd() pointing at the repo root can now fail if the notebook lives in a subfolder.

Quick sanity checks

Before changing anything, itโ€™s worth confirming what Python thinks is going on:

Print the working directory and its contents:

print(os.getcwd())

print(os.listdir(os.getcwd()))

This tells you immediately whether youโ€™re scanning a directory that actually contains your packages.

Also double-check that your packages include an init.py. pkgutil.walk_packages only discovers classic packages; it wonโ€™t enumerate PEP 420 namespace packages.

Recommended fixes

Which fix you choose really depends on where your code lives.

Option 1: Point pkgutil directly at your repo code (my preferred approach)

If your packages live under something like /Workspace/Repos///src, be explicit. Add that directory to sys.path and walk it directly:

import os
import sys
import pkgutil

repo_root = "/Workspace/Repos/<user>/<repo>"
src_dir = os.path.join(repo_root, "src")  # or repo_root if you donโ€™t use src/

if src_dir not in sys.path:
    sys.path.append(src_dir)

packages = list(pkgutil.walk_packages([src_dir]))
print(packages)

This removes all ambiguity about what youโ€™re scanning and what Python can import.

Option 2: Let sys.path do the work

If your notebook lives at the Git folder root (not nested), that root is already on sys.path. In that case you can just let pkgutil walk everything Python already knows about:

import pkgutil

packages = list(pkgutil.walk_packages())
print(packages)

This only works if your layout is clean and flat, but when it applies, itโ€™s the simplest solution.

Option 3: Compute the repo root from the notebook location

If your notebook is nested a few levels down, compute the repo root relative to the working directory and add it:

import os
import sys
import pkgutil

cwd = os.getcwd()
repo_root = os.path.dirname(os.path.dirname(cwd))  # adjust depth as needed

if repo_root not in sys.path:
    sys.path.append(repo_root)

packages = list(pkgutil.walk_packages([repo_root]))
print(packages)

Why os.getcwd() started betraying you

In 17.x, Databricks is much more consistent about setting CWD to the notebookโ€™s directory. If your code used to run from a location that happened to be the repo rootโ€”and now runs from a subfolderโ€”then walk_packages([os.getcwd()]) will return nothing because itโ€™s doing exactly what you asked: scanning the wrong directory.

That behavior lines up with the documented CWD semantics and the newer guidance around workspace files and Git folders. Nothing is โ€œbrokenโ€ so much as more strictly defined.

Hope these tips get you over the finish line.

Cheers, Lou.

Thanks for the detailed answer @Louis_Frolio ,

Unfortunately, none of this is working, I have a notebook in my repo root, I checked all the sys.path, the cwd and, did all the options you mentioned and still it doesn't work in dbr 17.2+. 

Simply put I see all the folders there in listdir, but somehow it doesn't pick up any packages.

Do you not experience the same with local packages?

 

Cheers,

Max

Louis_Frolio
Databricks Employee
Databricks Employee

Hmmm, I have not personally experienced this. I dug a little deeper in our internal docs and and leveraged some internal tools to put togehter another approach for you.  Please give this a try and let me know.

Youโ€™re running into a subtle but very real behavior change in Databricks Runtime 17.2, and it shows up most clearly when using pkgutil.walk_packages() with the current working directory.

This isnโ€™t your code suddenly โ€œbreaking.โ€ Itโ€™s the interaction between Pythonโ€™s import system and how DBR 17.2 (now on Python 3.12) treats discovery paths.

Letโ€™s walk through it.

The root cause

pkgutil.walk_packages() doesnโ€™t just crawl a filesystem path. It expects that path to behave like a real Python import location:

โ€ข The directory must contain proper packages (__init__.py)

โ€ข And just as importantly, the directory must be reachable through Pythonโ€™s import machinery

In DBR 17.2, relying on os.getcwd() alone is no longer sufficient. Even if the files are there, Python wonโ€™t reliably discover them unless that directory is also present on sys.path. Earlier runtimes were more forgiving; Python 3.12 is not.

Thatโ€™s why walk_packages() suddenly appears to return nothing.

The most reliable fix

Option 1: Explicitly add the directory to

sys.path

This aligns your filesystem view with Pythonโ€™s import system and works consistently:

import pkgutil
import os
import sys

cwd = os.getcwd()
if cwd not in sys.path:
    sys.path.insert(0, cwd)

packages = pkgutil.walk_packages([cwd])
print(list(packages))

This is the safest pattern and the one I recommend in most Databricks notebooks.

A cleaner alternative for repos

Option 2: Use an absolute workspace path

If your code lives in a repo or workspace folder, be explicit about where packages live instead of relying on the notebookโ€™s working directory:

import pkgutil
import os

repo_path = os.path.abspath("/Workspace/path/to/your/repo")
packages = pkgutil.walk_packages([repo_path])
print(list(packages))

This avoids ambiguity entirely and plays nicely with Git folders and workspace imports.

One more thing to double-check

Make sure your package structure is real Python, not just folders that โ€œlookโ€ like packages.

Every directory you expect to be discovered must include an __init__.py. Python 3.12 is noticeably stricter here, and DBR 17.2 surfaces that reality.

Why this showed up in 17.2

DBR 17.2 includes a Python upgrade to 3.12.x, along with internal changes to import handling. pkgutil.walk_packages() has always required paths to be importableโ€”but earlier runtimes were more lenient when the current working directory happened to work by accident.

In short:

What used to work implicitly now needs to be explicit.

Thatโ€™s not a regressionโ€”itโ€™s Python behaving the way it always documented itself.

Regards, Louis.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now