cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

'No file or Directory' error when using pandas.read_excel in Databricks

anirudh_a
New Contributor II

I am baffled by the behaviour of Databricks:

Below you can see the contents of the directory using dbutils in Databricks. It shows the `test.xlsx` file clearly in directory (and I can even open it using `dbutils.fs.head`) But when I go to use panda.read_excel to read it, I get the error below stating it can't be found...

I am running the commands on a Unity Catalog/Shared Cluster with Databricks runtime 13.2 (Spark v3.4.0)

wCLqf

1 ACCEPTED SOLUTION

Accepted Solutions

knutasm
New Contributor III

We used to have this problem, but worked around it by having the files in a UC external location, and using spark pandas instead of regular pandas, since it uses the spark access model and this works with UC grants. 

However, with the recent addition of UC Volumes, you can add the location as as a volume and access it as you would a file in a regular file system.

View solution in original post

8 REPLIES 8

Debayan
Esteemed Contributor III
Esteemed Contributor III

Hi,

Could you please see if the limitations were followed: https://docs.databricks.com/files/index.html#local-file-api-limitations

Also, could you please try to write it with dbfs:// at the beginning? 

Please tag @Debayan  with your next comment which will notify me. Thanks!

anirudh_a
New Contributor II

@Debayan 
I get the following message:

Use "/dbfs", not "dbfs:": The function expects a local file path. The error is caused by passing a path prefixed with "dbfs:".

Having a look here https://learn.microsoft.com/en-us/azure/databricks/dbfs/unity-catalog I read the following excerpt and I am wondering If I am missing something...like I said I am using the Unity Catalog on a shared cluster

How does DBFS work in shared access mode?

Shared access mode combines Unity Catalog data governance with Azure Databricks legacy table ACLs. Access to data in the hive_metastore is only available to users that have permissions explicitly granted.

To interact with files directly using DBFS, you must have ANY FILE permissions granted. Because ANY FILE allows users to bypass legacy tables ACLs in the hive_metastore and access all data managed by DBFS, Databricks recommends caution when granting this privilege.

Shared access mode does not support DBFS root or mounts.

Debayan
Esteemed Contributor III
Esteemed Contributor III

Hi, An admin must grant SELECT permission on files so the selected user can create a table.

You can also refer to https://kb.databricks.com/en_US/data/user-does-not-have-permission-select-on-any-file

anirudh_a
New Contributor II

This did not do anything...as far as I know applying the SELECT perms only pertain to tables in Catalogs. Still can't read files fro Dbfs.

Debayan
Esteemed Contributor III
Esteemed Contributor III

Hi, Could you please raise a support case so that we can investigate and triage on this? 

JameDavi_51481
New Contributor III

This is a really frustrating design choice - in a Unity SAM cluster, Databricks disabled the filesystem mount for DBFS that allows it to be read through vanilla Python, but left it in place for PySpark, because their implementation supports access control through Spark but not through Python.

As a workaround for reading data specifically, you can read the file from pyspark first, and then convert it to a vanilla Pandas object, but this does not work for all workflows, and is very poorly documented.

knutasm
New Contributor III

We used to have this problem, but worked around it by having the files in a UC external location, and using spark pandas instead of regular pandas, since it uses the spark access model and this works with UC grants. 

However, with the recent addition of UC Volumes, you can add the location as as a volume and access it as you would a file in a regular file system.

DamnKush
New Contributor II

Hey, I encountered it recently. I can see you are using the shared cluster, try switching to a single user cluster and it will fix it.

Can someone let me know why it wasn't working w a shared cluster?

Thanks.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.