I developed a notebook that uses *.py files as module imports. On my cluster, the logic in the notebook works fine, my custom modules get loaded, code executes.
Using an AzDO pipeline, I deploy the notebook and supporting files to a separate workspace which acts as a test environment. When the notebook runs, it complains Errno 95 when loading modules. On a module import statement it fails.
I've opened the files in the test environment and confirmed that all files are physically present. In Python used by the notebook, I verify that the folders and files exists using os.path.exists(...).
For Workspace B (test), I'm using my same exact cluster as where I developed, Workspace A.
Based on talking to two LLMs, they each claim that the workspace file system is virtualized. Ergo, there are no guarantees that Python can access local *.py files. Even if Python says TRUE when os.path.exists() is called, the file may not be really there. So in Workspace A, my development, the file exists and the notebook runs. In Workspace B (test), it may/may not because of the magic of "back planes", synchronization, etc.
LLMs both advise that DBFS is the best way to reliably store Python modules.
Databricks documentation & the CLI lead you to believe that the workspace can support different file types -- it can. That's how I deploy to test, but the files aren't really there. So while I can open the folders in the workspace and see the *.py file there, they really aren't there or are inaccessible to Python on that workspace.
According to ChatGPT 4.1, even if my team adopted asset bundles, there are no guarantees that a local Python file will really be there.
I've been doing software dev and ETLs for a long time and am frankly amazed at the learning curve involved. "Well, it's a file system but not really a file system and you may/may not get stuff pushed correctly using our tools; or they are pushed correctly but don't work reliably because of the back plane, etc."
Some of the documentation indicates DBFS is no longer recommended. Workspaces are not practical because, according to LLMs, they are primarily for the purpose of supporting notebooks. Python is not realiable because it can tell you a file and folder exists and even list them, but they may/may not be there because of the complexity of a distributed file system, etc.
If the problem is a permissions issue in Workspace B, the error message doesn't indicate this at all.
Wheel files are an option, but really onerous for us since we are not using Unity Catalog at the moment.
My team is really struggling with DB as a whole. I'm finding it is an anti-pattern for SDLC activities, but am looking for a different perspective. Please do not reply with "use VS Code with Databricks plugin." This is not an option for us and the guidance for other scenarios is very confusing (don't use DBFS, WSFS is not reliable, etc.)