05-13-2025 08:20 AM
I have several notebooks that run code to ingest data from various APIs into our Data Warehouse. I have several modules that I reuse in multiple notebooks, things like redshift functions, string cleaning functions and json cleaning functions. Out of nowhere this morning some notebooks started to randomly fail to import modules or fail to import functions from said modules.
In the bellow example the code is failing to import a function (that I confirmed exists and the name is correct)
All the Jobs are running on Serverless. When I run the same notebooks manually, there are no errors. Also, when I just click to "Repair Run" once it fails, it runs normally.
Anyone has any idea of what could possibly be happening?
05-13-2025 09:44 AM
This is most likely caused by race conditions in cluster/job startup combined with dynamic module paths or
delayed workspace availability in Serverless or ephemeral job clusters. Specifically:
-- sys.path may not yet include /Workspace/Tools when the module is imported.
-- The underlying file system (e.g., DBFS mount of Workspace) might not be fully initialized at the exact moment the import is executed.
-- Workspace imports work fine in interactive sessions because the environment is already fully initialized.
Also, Databricks may have updated runtime behavior or tightened workspace initialization in recent releases,
which could expose previously hidden issues.
Fix Options
1. Use Absolute Imports with Workspace Directories
If your module is in /Workspace/Tools/json_tools.py, make sure you're importing it properly:
import sys
sys.path.append("/Workspace/Tools")
from json_tools import clean_dataframe_jsons
If this fails only sometimes, you can wrap it with a retry logic (see below).
2. Retry-Import with time.sleep (Recommended in Your Case)
Yes, you can use time.sleep(x) combined with retry logic:
import time
import sys
sys.path.append("/Workspace/Tools")
retries = 3
for i in range(retries):
try:
from json_tools import clean_dataframe_jsons
break # success
except ImportError as e:
if i < retries - 1:
print(f"Retrying import due to error: {e}")
time.sleep(2) # Delay before retrying
else:
raise
3. Move Reusable Code into a Wheel or .whl File
If possible, package your shared functions into a .whl file and install it as a library to the job cluster via %pip install or job-level configuration.
This is the most reliable and scalable solution.
Example:
python setup.py bdist_wheel
Then install:
%pip install /Workspace/Tools/my_wheel_package.whl
4. Avoid Serverless if Determinism is Critical
Switch to a non-serverless job cluster or interactive cluster if possible,
as the Workspace file system is guaranteed to be available earlier in the execution lifecycle.
05-13-2025 09:13 AM - edited 05-13-2025 09:13 AM
Thanks for sharing the error and the context — this intermittent module import issue in Databricks Serverless jobs is a known behavior in some environments,
and here’s what’s likely going wrong :
Root Cause:
A race condition or cold-start issue in serverless clusters where:
-- The notebook starts executing before the module files in /Workspace/Tools are mounted/available.
-- Python import caches may be stale or inconsistent between jobs or cluster warmups.
05-13-2025 09:30 AM
Thanks for you response, @lingareddy_Alva !
The code ran for months before starting to exhibit this behavior. Could something have changed now in Databricks?
And how do I fix this? Is a time.sleep(x) after I import the modules something that can help?
05-13-2025 09:44 AM
This is most likely caused by race conditions in cluster/job startup combined with dynamic module paths or
delayed workspace availability in Serverless or ephemeral job clusters. Specifically:
-- sys.path may not yet include /Workspace/Tools when the module is imported.
-- The underlying file system (e.g., DBFS mount of Workspace) might not be fully initialized at the exact moment the import is executed.
-- Workspace imports work fine in interactive sessions because the environment is already fully initialized.
Also, Databricks may have updated runtime behavior or tightened workspace initialization in recent releases,
which could expose previously hidden issues.
Fix Options
1. Use Absolute Imports with Workspace Directories
If your module is in /Workspace/Tools/json_tools.py, make sure you're importing it properly:
import sys
sys.path.append("/Workspace/Tools")
from json_tools import clean_dataframe_jsons
If this fails only sometimes, you can wrap it with a retry logic (see below).
2. Retry-Import with time.sleep (Recommended in Your Case)
Yes, you can use time.sleep(x) combined with retry logic:
import time
import sys
sys.path.append("/Workspace/Tools")
retries = 3
for i in range(retries):
try:
from json_tools import clean_dataframe_jsons
break # success
except ImportError as e:
if i < retries - 1:
print(f"Retrying import due to error: {e}")
time.sleep(2) # Delay before retrying
else:
raise
3. Move Reusable Code into a Wheel or .whl File
If possible, package your shared functions into a .whl file and install it as a library to the job cluster via %pip install or job-level configuration.
This is the most reliable and scalable solution.
Example:
python setup.py bdist_wheel
Then install:
%pip install /Workspace/Tools/my_wheel_package.whl
4. Avoid Serverless if Determinism is Critical
Switch to a non-serverless job cluster or interactive cluster if possible,
as the Workspace file system is guaranteed to be available earlier in the execution lifecycle.
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now