Intermittent failure with Python IMPORTS statements after upgrading to DBR18.0

thackman
New Contributor III

We have a python module (WidgetUtil.py) that sits in the same folder as our notebook. For the past few years we have been using a simple import statement to use it. Starting with DBR18.0 the imports fails intermittently (25% of the time) when running from job compute in PROD. It is 100% reliable when I use a personal/dedicated compute cluster in DEV. We rolled back to DBR17.2 and the failures went away. Then we rolled forward to DBR18.1Beta and the job started failing again.  FYI: This is on Azure.

imports.png

image (1).png

I did some debugging with AI suggestions, the theory was that FUSE was slow to mount. In the end that wasn't the case. We added a gatekeeper notebook at the start of the job. It monitored the paths and waited for the FUSE mount to complete. What we found was that the directory was always immediately available and we could either read the file immediately or it was never readable. Waiting up to two minutes never fixed the issue.

TestCode.jpg

A job that succeeded.

WorkingRun.jpg

A job that failed

FailedRun.jpg

Why is importing a .py file unreliable now?