07-25-2023 07:20 AM - edited 07-25-2023 07:23 AM
I am baffled by the behaviour of Databricks:
Below you can see the contents of the directory using dbutils in Databricks. It shows the `test.xlsx` file clearly in directory (and I can even open it using `dbutils.fs.head`) But when I go to use panda.read_excel to read it, I get the error below stating it can't be found...
I am running the commands on a Unity Catalog/Shared Cluster with Databricks runtime 13.2 (Spark v3.4.0)
08-15-2023 11:59 PM
We used to have this problem, but worked around it by having the files in a UC external location, and using spark pandas instead of regular pandas, since it uses the spark access model and this works with UC grants.
However, with the recent addition of UC Volumes, you can add the location as as a volume and access it as you would a file in a regular file system.
07-25-2023 08:22 AM
Hi,
Could you please see if the limitations were followed: https://docs.databricks.com/files/index.html#local-file-api-limitations
Also, could you please try to write it with dbfs:// at the beginning?
Please tag @Debayan with your next comment which will notify me. Thanks!
07-25-2023 09:07 AM
@Debayan
I get the following message:
Use "/dbfs", not "dbfs:": The function expects a local file path. The error is caused by passing a path prefixed with "dbfs:".
Having a look here https://learn.microsoft.com/en-us/azure/databricks/dbfs/unity-catalog I read the following excerpt and I am wondering If I am missing something...like I said I am using the Unity Catalog on a shared cluster
Shared access mode combines Unity Catalog data governance with Azure Databricks legacy table ACLs. Access to data in the hive_metastore is only available to users that have permissions explicitly granted.
To interact with files directly using DBFS, you must have ANY FILE permissions granted. Because ANY FILE allows users to bypass legacy tables ACLs in the hive_metastore and access all data managed by DBFS, Databricks recommends caution when granting this privilege.
Shared access mode does not support DBFS root or mounts.
07-26-2023 08:25 AM
Hi, An admin must grant SELECT permission on files so the selected user can create a table.
You can also refer to https://kb.databricks.com/en_US/data/user-does-not-have-permission-select-on-any-file.
07-28-2023 01:20 PM
This did not do anything...as far as I know applying the SELECT perms only pertain to tables in Catalogs. Still can't read files fro Dbfs.
07-30-2023 10:16 PM
Hi, Could you please raise a support case so that we can investigate and triage on this?
08-11-2023 05:26 AM
This is a really frustrating design choice - in a Unity SAM cluster, Databricks disabled the filesystem mount for DBFS that allows it to be read through vanilla Python, but left it in place for PySpark, because their implementation supports access control through Spark but not through Python.
As a workaround for reading data specifically, you can read the file from pyspark first, and then convert it to a vanilla Pandas object, but this does not work for all workflows, and is very poorly documented.
08-15-2023 11:59 PM
We used to have this problem, but worked around it by having the files in a UC external location, and using spark pandas instead of regular pandas, since it uses the spark access model and this works with UC grants.
However, with the recent addition of UC Volumes, you can add the location as as a volume and access it as you would a file in a regular file system.
11-02-2023 04:33 AM
Hey, I encountered it recently. I can see you are using the shared cluster, try switching to a single user cluster and it will fix it.
Can someone let me know why it wasn't working w a shared cluster?
Thanks.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group