cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

DBR 17.3 – sporadic import failures from bundle

freddyT
New Contributor

Hi all,

Since two days ago we've been getting sporadic Python import failures on DBR 17.3, with no change on our side. The errors are either "cannot import name ..." or "no module named ...", and they come and go: the same code fails on one run and passes on the next.

We are on Databricks Runtime 17.3.15, job compute (single node Standard_DC4as_v5).

Is anyone else seeing this today? Any known cause or workaround would be a big help. Thanks!

3 REPLIES 3

anagilla
Databricks Employee
Databricks Employee

Hi, a few Databricks-specific things to check here. "From bundle" plus imports that come and go on the same runtime usually points at how the wheel gets packaged and installed across runs rather than at your code.

On the runtime itself: DBR 17.3 is an LTS release (GA Oct 22, 2025, Spark 4.0.0), so a broad runtime-side import regression is unlikely. That makes a packaging or install-caching interaction the more probable cause, and those are fixable from the bundle side.

The most common cause of cannot import name ... / no module named ... that flips between runs with no code change is a stale wheel being installed on some runs but not others:

  1. Same wheel version across rebuilds. If your wheel keeps the same version (say 0.0.1) between deploys, pip sees the requirement as already satisfied and can skip reinstalling it on compute that is warm or reused, so a newly added or renamed symbol is missing on those runs. Databricks added a setting for exactly this: put dynamic_version: true on your whl artifact and each deploy patches the wheel version suffix from the file timestamp, so new code always installs without you bumping the version in setup.py/pyproject.toml. It shipped in Databricks CLI 0.245.0 and is documented under artifacts. There is also a ready-made preset, artifacts_dynamic_version, that the default-python template enables for classic compute (see Custom presets).

     artifacts:
       my_wheel:
         type: whl
         dynamic_version: true
         build: uv build --wheel
  2. More than one wheel left in dist/. If your library reference is a glob like whl: ./dist/*.whl and older wheels are still sitting in dist/, the install can match an older file on some runs. Clean dist/ before each build, or point at the exact filename, so only the current wheel is present. (This class of issue shows up in this community thread.)

To confirm which run got which build, compare a passing run and a failing run: log the installed version of your package at the top of the task (importlib.metadata.version("<pkg>") or pip show <pkg>) and check it against the wheel version/timestamp your deploy produced. If passing and failing runs show different installed versions, it is the caching path above and dynamic_version will settle it.

On "is anyone else seeing this today": since it began about two days ago with no change on your side, it is worth ruling out a redeploy that reused the same version (a scheduled CI job, or a teammate's deploy) landing unevenly across warm job clusters. If the installed versions match on passing and failing runs and the wheel is clean, grab the run IDs of one passing and one failing run and open a support ticket, and check the Azure status page for your region.

(If your bundle deploys source files rather than a wheel, the same on-again/off-again behavior can instead come from sys.path/workspace-files resolution; the version-logging check above will tell you which path you are on.)

iyashk-DB
Databricks Employee
Databricks Employee

Hi, I think the accepted answer is pointing in the right direction.

Given that DBR 17.3 is an LTS runtime, this is more likely to be a packaging or deployment issue than a runtime regression.

One common cause is wheel caching. If your wheel version doesn't change between deployments, pip running on the cluster may determine that the package is already installed and skip reinstalling it. That can produce exactly the kind of intermittent behavior you're seeing.

If you're using Databricks Asset Bundles, consider enabling dynamic_version:

artifacts:
  my_project:
    type: whl
    build: python setup.py bdist_wheel
    dynamic_version: true

This feature (introduced in Databricks CLI 0.245.0) appends a unique timestamp to each generated wheel version so every deployment is treated as a new package and reinstalled accordingly.

It's also a good idea to clean the dist/ directory before each build. If your artifact path uses a wildcard, an older wheel remaining alongside the newly built one is another common source of inconsistent deployments.

Thanks for the reply. In our case it isn't wheel caching: we don't deploy a wheel. Our tasks are plain Python script tasks, deployed via a Databricks Asset Bundle directly into the workspace, e.g. /Workspace/Users/<user-id>/.bundle/project/Default/production/files/src/...
So there's no pip install and  we've also already redeployed several times with no change.

The failing imports are for modules/functions inside the bundle's own source tree. It's intermittent and moves around: a task fails with "No module named ..." / "cannot import name ...", the next run passes it but fails another task the same way. That looks like files not being reliably visible at import time 🤔

I also noticed a recent maintenance update for 17.3 LTS, so I'm wondering if this could be related.

Same issue seems to be reported here by someone else (also two days ago), and I've added a message too.