<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: pkgutils walk_packages stopped working in DBR 17.2 in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/pkgutils-walk-packages-stopped-working-in-dbr-17-2/m-p/142238#M51906</link>
    <description>&lt;P&gt;I did a bit of deep dive into the source code of the pkgutils walk_package, and I noticed this happening:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;def get_importer(path_item):
    path_item = os.fsdecode(path_item)
    try:
        importer = sys.path_importer_cache[path_item]
    except:
        importer = []
    return importer&lt;/LI-CODE&gt;&lt;P&gt;for a given path in dbr &amp;lt;17.2 like `/Workspace/Repos/&amp;lt;user&amp;gt;/&amp;lt;repo&amp;gt;` this returns a normal FileFinder Object, when I try on &amp;gt;= 17.2 this returns&amp;nbsp;&amp;lt;dbruntime.workspace_import_machinery._WorkspacePathEntryFinder object at 0x.....&amp;gt;.&lt;/P&gt;&lt;P&gt;looking further this means that it will never find any files and thus not work on imports within the repo.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 19 Dec 2025 12:05:45 GMT</pubDate>
    <dc:creator>Maxrb</dc:creator>
    <dc:date>2025-12-19T12:05:45Z</dc:date>
    <item>
      <title>pkgutils walk_packages stopped working in DBR 17.2</title>
      <link>https://community.databricks.com/t5/data-engineering/pkgutils-walk-packages-stopped-working-in-dbr-17-2/m-p/142144#M51889</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;After moving from Databricks runtime 17.1 to 17.2 suddenly my pkgutils walk_packages doesn't identify any packages within my repository anymore.&lt;/P&gt;&lt;P&gt;This is my example code:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;import pkgutil
import os

packages = pkgutil.walk_packages([os.getcwd()])
print(list(packages))&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;previously it found all my packages but since the update to 17.2 it doesn't work anymore.&lt;/P&gt;</description>
      <pubDate>Thu, 18 Dec 2025 09:41:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pkgutils-walk-packages-stopped-working-in-dbr-17-2/m-p/142144#M51889</guid>
      <dc:creator>Maxrb</dc:creator>
      <dc:date>2025-12-18T09:41:00Z</dc:date>
    </item>
    <item>
      <title>Re: pkgutils walk_packages stopped working in DBR 17.2</title>
      <link>https://community.databricks.com/t5/data-engineering/pkgutils-walk-packages-stopped-working-in-dbr-17-2/m-p/142164#M51892</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/201317"&gt;@Maxrb&lt;/a&gt;&amp;nbsp;, I did some digging on my end and I have some suggestions and/or hints to help you further troubleshoot your issue.&lt;/P&gt;
&lt;P class="p1"&gt;What you’re running into lines up with a few runtime-specific behaviors that changed around Databricks Runtime 17.x, and together they explain why package discovery suddenly went quiet after the move to 17.2.&lt;/P&gt;
&lt;P class="p1"&gt;What likely changed&lt;/P&gt;
&lt;P class="p1"&gt;First, the current working directory on Databricks is the directory of the running notebook or script, not necessarily your repo root. If your packages live somewhere else—say at the repo root or under a src folder—then pkgutil.walk_packages([os.getcwd()]) will simply never see them. It’s scanning the wrong place.&lt;/P&gt;
&lt;P class="p1"&gt;Second, when you’re importing Python code from workspace files or Git folders that live outside the notebook’s directory, you need to be explicit about sys.path. The root of a Git folder is automatically added, but subdirectories are not. And if you’re working with workspace files, the path you append must include the /Workspace/… prefix. If Python can’t see the directory, pkgutil won’t either.&lt;/P&gt;
&lt;P class="p1"&gt;Finally, across the 17.x line there were changes to Python import hooks that tightened up how workspace paths are handled. A related issue showed up in 17.3 with wheel tasks, but even in 17.2 the behavior is more strict and predictable. Code that implicitly relied on os.getcwd() pointing at the repo root can now fail if the notebook lives in a subfolder.&lt;/P&gt;
&lt;P class="p1"&gt;Quick sanity checks&lt;/P&gt;
&lt;P class="p1"&gt;Before changing anything, it’s worth confirming what Python thinks is going on:&lt;/P&gt;
&lt;P class="p1"&gt;Print the working directory and its contents:&lt;/P&gt;
&lt;P class="p1"&gt;print(os.getcwd())&lt;/P&gt;
&lt;P class="p1"&gt;print(os.listdir(os.getcwd()))&lt;/P&gt;
&lt;P class="p1"&gt;This tells you immediately whether you’re scanning a directory that actually contains your packages.&lt;/P&gt;
&lt;P class="p1"&gt;Also double-check that your packages include an &lt;SPAN class="s2"&gt;&lt;STRONG&gt;init&lt;/STRONG&gt;&lt;/SPAN&gt;.py. pkgutil.walk_packages only discovers classic packages; it won’t enumerate PEP 420 namespace packages.&lt;/P&gt;
&lt;P class="p1"&gt;Recommended fixes&lt;/P&gt;
&lt;P class="p1"&gt;Which fix you choose really depends on where your code lives.&lt;/P&gt;
&lt;P class="p1"&gt;Option 1: Point pkgutil directly at your repo code (my preferred approach)&lt;/P&gt;
&lt;P class="p1"&gt;If your packages live under something like /Workspace/Repos///src, be explicit. Add that directory to sys.path and walk it directly:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;import os
import sys
import pkgutil

repo_root = "/Workspace/Repos/&amp;lt;user&amp;gt;/&amp;lt;repo&amp;gt;"
src_dir = os.path.join(repo_root, "src")  # or repo_root if you don’t use src/

if src_dir not in sys.path:
    sys.path.append(src_dir)

packages = list(pkgutil.walk_packages([src_dir]))
print(packages)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P class="p1"&gt;This removes all ambiguity about what you’re scanning and what Python can import.&lt;/P&gt;
&lt;P class="p1"&gt;Option 2: Let sys.path do the work&lt;/P&gt;
&lt;P class="p1"&gt;If your notebook lives at the Git folder root (not nested), that root is already on sys.path. In that case you can just let pkgutil walk everything Python already knows about:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;import pkgutil

packages = list(pkgutil.walk_packages())
print(packages)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P class="p1"&gt;This only works if your layout is clean and flat, but when it applies, it’s the simplest solution.&lt;/P&gt;
&lt;P class="p1"&gt;Option 3: Compute the repo root from the notebook location&lt;/P&gt;
&lt;P class="p1"&gt;If your notebook is nested a few levels down, compute the repo root relative to the working directory and add it:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;import os
import sys
import pkgutil

cwd = os.getcwd()
repo_root = os.path.dirname(os.path.dirname(cwd))  # adjust depth as needed

if repo_root not in sys.path:
    sys.path.append(repo_root)

packages = list(pkgutil.walk_packages([repo_root]))
print(packages)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P class="p1"&gt;Why os.getcwd() started betraying you&lt;/P&gt;
&lt;P class="p1"&gt;In 17.x, Databricks is much more consistent about setting CWD to the notebook’s directory. If your code used to run from a location that happened to be the repo root—and now runs from a subfolder—then walk_packages([os.getcwd()]) will return nothing because it’s doing exactly what you asked: scanning the wrong directory.&lt;/P&gt;
&lt;P class="p1"&gt;That behavior lines up with the documented CWD semantics and the newer guidance around workspace files and Git folders. Nothing is “broken” so much as more strictly defined.&lt;/P&gt;
&lt;P class="p1"&gt;Hope these tips get you over the finish line.&lt;/P&gt;
&lt;P class="p1"&gt;Cheers, Lou.&lt;/P&gt;</description>
      <pubDate>Thu, 18 Dec 2025 14:08:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pkgutils-walk-packages-stopped-working-in-dbr-17-2/m-p/142164#M51892</guid>
      <dc:creator>Louis_Frolio</dc:creator>
      <dc:date>2025-12-18T14:08:39Z</dc:date>
    </item>
    <item>
      <title>Re: pkgutils walk_packages stopped working in DBR 17.2</title>
      <link>https://community.databricks.com/t5/data-engineering/pkgutils-walk-packages-stopped-working-in-dbr-17-2/m-p/142170#M51894</link>
      <description>&lt;P&gt;Thanks for the detailed answer&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/34815"&gt;@Louis_Frolio&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Unfortunately, none of this is working, I have a notebook in my repo root, I checked all the sys.path, the cwd and, did all the options you mentioned and still it doesn't work in dbr 17.2+.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Simply put I see all the folders there in listdir, but somehow it doesn't pick up any packages.&lt;/P&gt;&lt;P&gt;Do you not experience the same with local packages?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Cheers,&lt;/P&gt;&lt;P&gt;Max&lt;/P&gt;</description>
      <pubDate>Thu, 18 Dec 2025 14:54:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pkgutils-walk-packages-stopped-working-in-dbr-17-2/m-p/142170#M51894</guid>
      <dc:creator>Maxrb</dc:creator>
      <dc:date>2025-12-18T14:54:13Z</dc:date>
    </item>
    <item>
      <title>Re: pkgutils walk_packages stopped working in DBR 17.2</title>
      <link>https://community.databricks.com/t5/data-engineering/pkgutils-walk-packages-stopped-working-in-dbr-17-2/m-p/142192#M51898</link>
      <description>&lt;P&gt;Hmmm, I have not personally experienced this. I dug a little deeper in our internal docs and and leveraged some internal tools to put togehter another approach for you.&amp;nbsp; Please give this a try and let me know.&lt;/P&gt;
&lt;P class="p1"&gt;You’re running into a subtle but very real behavior change in &lt;SPAN class="s2"&gt;&lt;STRONG&gt;Databricks Runtime 17.2&lt;/STRONG&gt;&lt;/SPAN&gt;, and it shows up most clearly when using &lt;SPAN class="s3"&gt;pkgutil.walk_packages()&lt;/SPAN&gt; with the current working directory.&lt;/P&gt;
&lt;P class="p1"&gt;This isn’t your code suddenly “breaking.” It’s the interaction between Python’s import system and how DBR 17.2 (now on Python 3.12) treats discovery paths.&lt;/P&gt;
&lt;P class="p1"&gt;Let’s walk through it.&lt;/P&gt;
&lt;H2&gt;&lt;STRONG&gt;The root cause&lt;/STRONG&gt;&lt;/H2&gt;
&lt;P class="p1"&gt;&lt;SPAN class="s3"&gt;pkgutil.walk_packages()&lt;/SPAN&gt; doesn’t just crawl a filesystem path. It expects that path to behave like a &lt;SPAN class="s2"&gt;&lt;STRONG&gt;real Python import location&lt;/STRONG&gt;&lt;/SPAN&gt;:&lt;/P&gt;
&lt;P class="p1"&gt;• The directory must contain proper packages (&lt;SPAN class="s3"&gt;__init__.py&lt;/SPAN&gt;)&lt;/P&gt;
&lt;P class="p1"&gt;• And just as importantly, the directory must be &lt;SPAN class="s2"&gt;&lt;STRONG&gt;reachable through Python’s import machinery&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="p1"&gt;In DBR 17.2, relying on &lt;SPAN class="s3"&gt;os.getcwd()&lt;/SPAN&gt; alone is no longer sufficient. Even if the files are there, Python won’t reliably discover them unless that directory is also present on &lt;SPAN class="s3"&gt;sys.path&lt;/SPAN&gt;. Earlier runtimes were more forgiving; Python 3.12 is not.&lt;/P&gt;
&lt;P class="p1"&gt;That’s why &lt;SPAN class="s3"&gt;walk_packages()&lt;/SPAN&gt; suddenly appears to return nothing.&lt;/P&gt;
&lt;H2&gt;&lt;STRONG&gt;The most reliable fix&lt;/STRONG&gt;&lt;/H2&gt;
&lt;H3&gt;&lt;STRONG&gt;Option 1: Explicitly add the directory to &lt;/STRONG&gt;&lt;/H3&gt;
&lt;H3&gt;&lt;STRONG&gt;sys.path&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P class="p1"&gt;This aligns your filesystem view with Python’s import system and works consistently:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;import pkgutil
import os
import sys

cwd = os.getcwd()
if cwd not in sys.path:
    sys.path.insert(0, cwd)

packages = pkgutil.walk_packages([cwd])
print(list(packages))&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P class="p1"&gt;This is the safest pattern and the one I recommend in most Databricks notebooks.&lt;/P&gt;
&lt;H2&gt;&lt;STRONG&gt;A cleaner alternative for repos&lt;/STRONG&gt;&lt;/H2&gt;
&lt;H3&gt;&lt;STRONG&gt;Option 2: Use an absolute workspace path&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P class="p1"&gt;If your code lives in a repo or workspace folder, be explicit about where packages live instead of relying on the notebook’s working directory:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;import pkgutil
import os

repo_path = os.path.abspath("/Workspace/path/to/your/repo")
packages = pkgutil.walk_packages([repo_path])
print(list(packages))&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P class="p1"&gt;This avoids ambiguity entirely and plays nicely with Git folders and workspace imports.&lt;/P&gt;
&lt;H2&gt;&lt;STRONG&gt;One more thing to double-check&lt;/STRONG&gt;&lt;/H2&gt;
&lt;P class="p1"&gt;Make sure your package structure is real Python, not just folders that “look” like packages.&lt;/P&gt;
&lt;P class="p1"&gt;Every directory you expect to be discovered must include an &lt;SPAN class="s2"&gt;__init__.py&lt;/SPAN&gt;. Python 3.12 is noticeably stricter here, and DBR 17.2 surfaces that reality.&lt;/P&gt;
&lt;H2&gt;&lt;STRONG&gt;Why this showed up in 17.2&lt;/STRONG&gt;&lt;/H2&gt;
&lt;P class="p1"&gt;DBR 17.2 includes a Python upgrade to &lt;SPAN class="s3"&gt;&lt;STRONG&gt;3.12.x&lt;/STRONG&gt;&lt;/SPAN&gt;, along with internal changes to import handling. &lt;SPAN class="s2"&gt;pkgutil.walk_packages()&lt;/SPAN&gt; has always required paths to be importable—but earlier runtimes were more lenient when the current working directory happened to work by accident.&lt;/P&gt;
&lt;P class="p1"&gt;In short:&lt;/P&gt;
&lt;P class="p1"&gt;What used to work implicitly now needs to be explicit.&lt;/P&gt;
&lt;P class="p1"&gt;That’s not a regression—it’s Python behaving the way it always documented itself.&lt;/P&gt;
&lt;P class="p1"&gt;Regards, Louis.&lt;/P&gt;</description>
      <pubDate>Thu, 18 Dec 2025 17:35:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pkgutils-walk-packages-stopped-working-in-dbr-17-2/m-p/142192#M51898</guid>
      <dc:creator>Louis_Frolio</dc:creator>
      <dc:date>2025-12-18T17:35:17Z</dc:date>
    </item>
    <item>
      <title>Re: pkgutils walk_packages stopped working in DBR 17.2</title>
      <link>https://community.databricks.com/t5/data-engineering/pkgutils-walk-packages-stopped-working-in-dbr-17-2/m-p/142227#M51904</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/34815"&gt;@Louis_Frolio&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Unfortunately Whatever I am doing, add all paths etc, trying all your solutions it just simply doesn't work. When I run pkgutil on for instance a pyspark.sql packages __path__ it simply works. For me it looks like anything inside the workspace it doesn't find, while in dbr &amp;lt;17.2, all of these things were working. I don't see any files being discovered whatsoever, it just returns an empty array.&lt;/P&gt;&lt;P&gt;I'm a bit lost what could be happening here, I tried it inside a repo, a normal workspace with a folder but somehow no matter what I try it always return an empty list when the "package" is inside my workspace.&lt;/P&gt;</description>
      <pubDate>Fri, 19 Dec 2025 10:49:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pkgutils-walk-packages-stopped-working-in-dbr-17-2/m-p/142227#M51904</guid>
      <dc:creator>Maxrb</dc:creator>
      <dc:date>2025-12-19T10:49:26Z</dc:date>
    </item>
    <item>
      <title>Re: pkgutils walk_packages stopped working in DBR 17.2</title>
      <link>https://community.databricks.com/t5/data-engineering/pkgutils-walk-packages-stopped-working-in-dbr-17-2/m-p/142238#M51906</link>
      <description>&lt;P&gt;I did a bit of deep dive into the source code of the pkgutils walk_package, and I noticed this happening:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;def get_importer(path_item):
    path_item = os.fsdecode(path_item)
    try:
        importer = sys.path_importer_cache[path_item]
    except:
        importer = []
    return importer&lt;/LI-CODE&gt;&lt;P&gt;for a given path in dbr &amp;lt;17.2 like `/Workspace/Repos/&amp;lt;user&amp;gt;/&amp;lt;repo&amp;gt;` this returns a normal FileFinder Object, when I try on &amp;gt;= 17.2 this returns&amp;nbsp;&amp;lt;dbruntime.workspace_import_machinery._WorkspacePathEntryFinder object at 0x.....&amp;gt;.&lt;/P&gt;&lt;P&gt;looking further this means that it will never find any files and thus not work on imports within the repo.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 19 Dec 2025 12:05:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pkgutils-walk-packages-stopped-working-in-dbr-17-2/m-p/142238#M51906</guid>
      <dc:creator>Maxrb</dc:creator>
      <dc:date>2025-12-19T12:05:45Z</dc:date>
    </item>
    <item>
      <title>Re: pkgutils walk_packages stopped working in DBR 17.2</title>
      <link>https://community.databricks.com/t5/data-engineering/pkgutils-walk-packages-stopped-working-in-dbr-17-2/m-p/142259#M51907</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/201317"&gt;@Maxrb&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P class="p1"&gt;Just thinking out loud here, but this might be worth experimenting with.&lt;/P&gt;
&lt;P class="p1"&gt;You could try using a Unity Catalog Volume as a lightweight package repository. Volumes can act as a secure, governed home for Python wheels (and JARs), and Databricks explicitly supports installing libraries directly from volume paths onto clusters, notebooks, and jobs. In fact, for UC-enabled workspaces, volumes are the recommended pattern for this exact use case.&lt;/P&gt;
&lt;P&gt;Just a thought.&lt;/P&gt;
&lt;P&gt;Cheers, Lou.&lt;/P&gt;</description>
      <pubDate>Fri, 19 Dec 2025 13:24:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pkgutils-walk-packages-stopped-working-in-dbr-17-2/m-p/142259#M51907</guid>
      <dc:creator>Louis_Frolio</dc:creator>
      <dc:date>2025-12-19T13:24:42Z</dc:date>
    </item>
    <item>
      <title>Re: pkgutils walk_packages stopped working in DBR 17.2</title>
      <link>https://community.databricks.com/t5/data-engineering/pkgutils-walk-packages-stopped-working-in-dbr-17-2/m-p/142265#M51910</link>
      <description>&lt;P&gt;&amp;nbsp;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/34815"&gt;@Louis_Frolio&lt;/a&gt;,&lt;/P&gt;&lt;P&gt;Thanks for thinking along, the whole idea is that this package is not installed as a jar, wheel or something else, but it's a living module in my repository. For production I don't think I will have this issue as I install my repo as a wheel file using Databricks asset bundles and I expect them to still be discovered using this pkgutil, but currently when developing in databricks it's breaking. Note that locally in vscode using databricks connect everything is still working fine.&lt;/P&gt;&lt;P&gt;I checked all the updates in dbr 17.2 and I couldn't find anything specifically related to this.&lt;/P&gt;&lt;P&gt;I don't have the capacity to investigate this any further currently, but I am doubting that the current behaviour is correct.&lt;/P&gt;&lt;P&gt;but again, thanks for thinking along!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Cheers,&lt;/P&gt;&lt;P&gt;Max&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 19 Dec 2025 14:38:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pkgutils-walk-packages-stopped-working-in-dbr-17-2/m-p/142265#M51910</guid>
      <dc:creator>Maxrb</dc:creator>
      <dc:date>2025-12-19T14:38:37Z</dc:date>
    </item>
  </channel>
</rss>

