cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Job run failing to import modules

marcio_oliveira
New Contributor II

I have several notebooks that run code to ingest data from various APIs into our Data Warehouse. I have several modules that I reuse in multiple notebooks, things like redshift functions, string cleaning functions and json cleaning functions. Out of nowhere this morning some notebooks started to randomly fail to import modules or fail to import functions from said modules.

In the bellow example the code is failing to import a function (that I confirmed exists and the name is correct)

marcio_oliveira_0-1747149522503.png

All the Jobs are running on Serverless. When I run the same notebooks manually, there are no errors. Also, when I just click to "Repair Run" once it fails, it runs normally.

Anyone has any idea of what could possibly be happening?

1 ACCEPTED SOLUTION

Accepted Solutions

Hi @marcio_oliveira 

This is most likely caused by race conditions in cluster/job startup combined with dynamic module paths or
delayed workspace availability in Serverless or ephemeral job clusters. Specifically:
-- sys.path may not yet include /Workspace/Tools when the module is imported.
-- The underlying file system (e.g., DBFS mount of Workspace) might not be fully initialized at the exact moment the import is executed.
-- Workspace imports work fine in interactive sessions because the environment is already fully initialized.

Also, Databricks may have updated runtime behavior or tightened workspace initialization in recent releases,
which could expose previously hidden issues.

Fix Options
1. Use Absolute Imports with Workspace Directories
If your module is in /Workspace/Tools/json_tools.py, make sure you're importing it properly:

import sys
sys.path.append("/Workspace/Tools")
from json_tools import clean_dataframe_jsons

If this fails only sometimes, you can wrap it with a retry logic (see below).

2. Retry-Import with time.sleep (Recommended in Your Case)
Yes, you can use time.sleep(x) combined with retry logic:

import time
import sys

sys.path.append("/Workspace/Tools")

retries = 3
for i in range(retries):
try:
from json_tools import clean_dataframe_jsons
break # success
except ImportError as e:
if i < retries - 1:
print(f"Retrying import due to error: {e}")
time.sleep(2) # Delay before retrying
else:
raise


3. Move Reusable Code into a Wheel or .whl File
If possible, package your shared functions into a .whl file and install it as a library to the job cluster via %pip install or job-level configuration.
This is the most reliable and scalable solution.

Example:
python setup.py bdist_wheel

Then install:

%pip install /Workspace/Tools/my_wheel_package.whl

4. Avoid Serverless if Determinism is Critical
Switch to a non-serverless job cluster or interactive cluster if possible,
as the Workspace file system is guaranteed to be available earlier in the execution lifecycle.

 

 

 

LR

View solution in original post

3 REPLIES 3

lingareddy_Alva
Honored Contributor II

Hi @marcio_oliveira 

Thanks for sharing the error and the context — this intermittent module import issue in Databricks Serverless jobs is a known behavior in some environments,
and here’s what’s likely going wrong :

Root Cause:
A race condition or cold-start issue in serverless clusters where:
-- The notebook starts executing before the module files in /Workspace/Tools are mounted/available.
-- Python import caches may be stale or inconsistent between jobs or cluster warmups.

 

LR

Thanks for you response, @lingareddy_Alva !
The code ran for months before starting to exhibit this behavior. Could something have changed now in Databricks?
And how do I fix this? Is a time.sleep(x) after I import the modules something that can help?

Hi @marcio_oliveira 

This is most likely caused by race conditions in cluster/job startup combined with dynamic module paths or
delayed workspace availability in Serverless or ephemeral job clusters. Specifically:
-- sys.path may not yet include /Workspace/Tools when the module is imported.
-- The underlying file system (e.g., DBFS mount of Workspace) might not be fully initialized at the exact moment the import is executed.
-- Workspace imports work fine in interactive sessions because the environment is already fully initialized.

Also, Databricks may have updated runtime behavior or tightened workspace initialization in recent releases,
which could expose previously hidden issues.

Fix Options
1. Use Absolute Imports with Workspace Directories
If your module is in /Workspace/Tools/json_tools.py, make sure you're importing it properly:

import sys
sys.path.append("/Workspace/Tools")
from json_tools import clean_dataframe_jsons

If this fails only sometimes, you can wrap it with a retry logic (see below).

2. Retry-Import with time.sleep (Recommended in Your Case)
Yes, you can use time.sleep(x) combined with retry logic:

import time
import sys

sys.path.append("/Workspace/Tools")

retries = 3
for i in range(retries):
try:
from json_tools import clean_dataframe_jsons
break # success
except ImportError as e:
if i < retries - 1:
print(f"Retrying import due to error: {e}")
time.sleep(2) # Delay before retrying
else:
raise


3. Move Reusable Code into a Wheel or .whl File
If possible, package your shared functions into a .whl file and install it as a library to the job cluster via %pip install or job-level configuration.
This is the most reliable and scalable solution.

Example:
python setup.py bdist_wheel

Then install:

%pip install /Workspace/Tools/my_wheel_package.whl

4. Avoid Serverless if Determinism is Critical
Switch to a non-serverless job cluster or interactive cluster if possible,
as the Workspace file system is guaranteed to be available earlier in the execution lifecycle.

 

 

 

LR

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now