cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Intermittent failure with Python IMPORTS statements after upgrading to DBR18.0

thackman
New Contributor III

We have a python module (WidgetUtil.py) that sits in the same folder as our notebook. For the past few years we have been using a simple import statement to use it. Starting with DBR18.0 the imports fails intermittently (25% of the time) when running from job compute in PROD. It is 100% reliable when I use a personal/dedicated compute cluster in DEV. We rolled back to DBR17.2 and the failures went away. Then we rolled forward to DBR18.1Beta and the job started failing again.  FYI: This is on Azure.

imports.png

image (1).png

I did some debugging with AI suggestions, the theory was that FUSE was slow to mount. In the end that wasn't the case. We added a gatekeeper notebook at the start of the job. It monitored the paths and waited for the FUSE mount to complete. What we found was that the directory was always immediately available and we could either read the file immediately or it was never readable. Waiting up to two minutes never fixed the issue.

TestCode.jpg

A job that succeeded.

WorkingRun.jpg

A job that failed

FailedRun.jpg

Why is importing a .py file unreliable now? 

 

2 REPLIES 2

Fabricio_Mattos
Databricks Employee
Databricks Employee

The issue is caused by changes in Databricks Runtime 18.x that make importing a plain.py file from the notebook’s folder unreliable on job compute, even though the same pattern still works consistently on a personal DEV cluster. In 18.x, the folder that contains your notebook (and WidgetUtil.py) is no longer consistently added to Python’s sys. path for jobs, so import WidgetUtils sometimes works and sometimes fails, even though the file is present and readable.


Explicitly add the module folder to sys. path

Use this when you want minimal structural changes and a fast fix.


import os import sys

module_dir = "/Workspace/Shared/prod_utils" i

f module_dir not in sys.path:

   sys.path.insert(0, module_dir)

import WidgetUtil

thackman
New Contributor III

Thanks for the suggestion Fabricio.  We tried your suggestion of using sys.path.insert and it didn't improve the reliability.  We found that converting some of the modules into notebooks improved reliability a lot. But other python modules we couldn't convert to notebooks because they were used in python udfs and we ran into pickle issues. Also, 18.1 is out of beta and it seemed slightly better than 18.0. 

So overall our job now crashes 1-3 times per day with 80% of our python modules converted to notebooks and any remaining modules use imports and have a sys.path.insert block before them.