cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Databricks DBR 18.1 access workspace files error

der
Contributor III
import json

with open("/Workspace/Users/<USER>/config.json", "r") as f:
    config = json.load(f)
    print(config)

Throws following error

OSError: [Errno 5] Input/output error: '/Workspace/Users/<USER>/config.json'
[Trace ID: 00-874e2bc3d747c3611c0c4adf64f48620-ed365d30ae67cc7f-00]
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
File <command-6408169302940046>, line 3
1 import json
----> 3 with open("/Workspace/Users/<USER>/config.json", "r") as f:
4 config = json.load(f)
6 print(config)

File /databricks/python/lib/python3.12/site-packages/IPython/core/interactiveshell.py:324, in _modified_open(file, *args, **kwargs)
317 if file in {0, 1, 2}:
318 raise ValueError(
319 f"IPython won't let you open fd={file} by default "
320 "as it is likely to crash IPython. If you know what you are doing, "
321 "you can use builtins' open."
322 )
--> 324 return io_open(file, *args, **kwargs)

OSError: [Errno 5] Input/output error: '/Workspace/Users/<USER>/config.json'

Did something changed with DBR 18.1 and workspace files access? 

I see no limitation in documentation
https://docs.databricks.com/aws/en/files/#work-with-workspace-files

12 REPLIES 12

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @der ,

I have checked internally. No specific changes to workspace files access were introduced as part of DBR 18.1. The error seems to indicate that the mount/backend is unhealthy or blocked (network/NSG/cluster config), not that your Python code is wrong.

Can you try the below...

import os, getpass, subprocess, textwrap

user = getpass.getuser()
path = f"/Workspace/Users/{user}"

# Does listing the directory itself fail?
print("Listing:", path)
print(subprocess.run(["ls", "-l", path], text=True, capture_output=True))

If "ls" itself throws the same error, the /Workspace mount is broken on this cluster.

You can also spin up a new cluster with minimal config (no custom VNet/NSG changes, no init scripts, no extra drivers) and run the same open("/Workspace/...") there. If it works on the clean cluster but not on the original one, itโ€™s a cluster/network configuration issue.

A robust workaround is to store configs on DBFS instead of raw /Workspace/Users/.. However, if you must use the workspace files... the officially supported way is through the Workspace Files API / WorkspaceClient().files, rather than relying on FUSE paths. Use the SDK to download the file to local disk, then json.load it.

If the above doesn't resolve your issue, this is a good candidate for a Databricks support case. As you already have it, attach the trace ID, cluster ID, workspace URL, and confirm that any customerโ€‘managed NSGs allow egress to workspace APIs (including /api/2.0/workspace-files/*).

If this answer resolves your question, could you mark it as โ€œAccept as Solutionโ€? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

der
Contributor III

Hi @Ashwin_DSA 

I restarted the cluster and it worked again. If you do it multiple times, most times it works and sometimes it fails. ๐Ÿ˜Ÿ We see this issue only with DBR 18.1 (Standard with Photon, Dedicated without Photon)

DBFS is deactivated. 

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @der,

Can you run the same notebook on a test cluster in the same workspace with DBR 18.0 (or 17.3 LTS) with same access mode, DBFS still disabled? If the issue disappears there and only occurs on 18.1, thatโ€™s strong evidence for an 18.1โ€‘specific issue.

If you hit the error on 18.1.. just capture the trace ID from the stack trace and immediately run the below in a Python cell.

import os, getpass, subprocess

user = getpass.getuser()
base = f"/Workspace/Users/{user}"

print("ls base:")
print(subprocess.run(["ls", "-l", base], text=True, capture_output=True))

print("ls file:")
print(subprocess.run(["ls", "-l", "/Workspace/Users/<USER>/config.json"], text=True, capture_output=True))

In a %sh cell, check for FUSE / workspace-files errors:

dmesg | tail -n 50 || true

You can then open a support ticket with the Workspace URL and region, Cluster ID, DBR: 18.1, access mode, DBFS disabled, the full Python stack trace with trace ID(s), Output of the ls / dmesg checks above on a failing run and also provide confirmation that the same code on 18.0 or 17.3 LTS does not show Errno 5.

That gives support/engineering exactly what they need to confirm a regression in 18.1โ€™s WSFS/FUSE behaviour.

If this answer resolves your question, could you mark it as โ€œAccept as Solutionโ€? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

der
Contributor III

I will try this code next time if it happens. Keep you updated

der
Contributor III

Today I see the issue again on a dedicated cluster with DBR 18.1 with deactivated DBFS on workspace level 

Accessing File with Python result into: 

FileNotFoundError: [Errno 2] No such file or directory

Both subprocess run can access main folder "base" and the config.json "file" 

dmesg command does not work (even with sudo)

sudo dmesg | tail -n 50 || true
dmesg: read kernel buffer failed: Operation not permitted

 dbutils works too

dbutils.fs.ls("file:/Workspace/Users/<USER>")

and also native listing of the directory works

import os
files = os.listdir("/Workspace/Users/<USER>")

So it looks for me like there is something wrong or not working in /databricks/python/lib/python3.12/site-packages/IPython/core/interactiveshell.py with workspace files

File /databricks/python/lib/python3.12/site-packages/IPython/core/interactiveshell.py:324, in _modified_open(file, *args, **kwargs)
    317 if file in {0, 1, 2}:
    318     raise ValueError(
    319         f"IPython won't let you open fd={file} by default "
    320         "as it is likely to crash IPython. If you know what you are doing, "
    321         "you can use builtins' open."
    322     )
--> 324 return io_open(file, *args, **kwargs)

 

Youry
New Contributor II

We have similar issue with accessing Workspace files exactly in DBR 18.1 ( previous DBR 17.xx  have worked even with relative paths OK ). 

import yaml
import os

def settings():
    try:
        script_dir = os.path.dirname(os.path.abspath(__file__))
    except NameError:
        script_dir = os.getcwd() 
    config_path = os.path.normpath(
        os.path.join(script_dir, "..", "config", "config.yml")
    )

at the end OSError: [Errno 5] Input/output error: '/Workspace/code/config/config.yml' [Trace ID: 00-188a04ab49de4ed3b4ab6e0e3ba17e58-b94fcf438d525fb0-00]
Looks file cannot be opened even  it's FDQN is resolved correctly
maybe some strict rules was introduced in DBR 18.xx 

 

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @Youry,

Thank you for sharing this. I will try to flag it internally, but I highly recommend that you raise a support ticket to ensure it gets prioritised.

If this answer resolves your question, could you mark it as โ€œAccept as Solutionโ€? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

Ashwin_DSA
Databricks Employee
Databricks Employee

@der@Youry - FYI. I've now flagged this internally as well.

 

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

Youry
New Contributor II

Thanks ! 
however if the only documented option is to use  Workspace Files API  then probably have to adjust code accordingly. FYI We're using Workspace/code/ as part of CI/CD process - code and configs as well are landed into subdirectory by pipeline. Moving part of it to DBFS adds complexity 

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @Youry,

Understood. Did you happen to raise a support ticket? If so, could you please share the reference number? It will help me to escalate the issue.

 

 

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

der
Contributor III

Hi @Ashwin_DSA 

I created a support ticket and will provide you the reference number as soon as i get it.

Malthe
Valued Contributor

This happens to us on 18.0.3, but I haven't seen it on < 18.