cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Shared cluster configuration that permits `dbutils.fs` commands

Spencer_Kent
New Contributor III

My workspace has a couple different types of clusters, and I'm having issues using the `dbutils` filesystem utilities when connected to a shared cluster. I'm hoping you can help me fix the configuration of the shared cluster so that I can actually use the dbutils filesystem commands.

The workspace is set up to use Unity Catalog, and I'm not sure if that has anything to do with the error.

When I try to `ls` the DBFS root location I get an "INSUFFICIENT_PERMISSIONS" Spark security exception:

insufficient_permissions_on_shared_cluster 

The cluster this happens on is a Shared cluster with the data security mode set to "USER_ISOLATION" (by Terraform). It says Unrestricted in the screen shot below, but we set the data security mode in Terraform.

shared_cluster_configThis error does not occur on a Single User cluster with the Individual use policy:

individual_use_cluster 

Can you give me guidance on how to configure the shared cluster so that `dbutils.fs.ls("/")` won't error with insufficient permissions?

Thanks you so much!

7 REPLIES 7

Kaniz
Community Manager
Community Manager

Hi @Spencer Kent​, The "INSUFFICIENT_PERMISSIONS" Spark security exception typically occurs when the user executing the code does not have sufficient permissions to access the requested resource. In this case, your user account lacks the licenses to access the DBFS root location.

I see you're facing this error while using a shared cluster; check with your system administrator or the cluster owner to confirm that your user account has the required permissions.

Anonymous
Not applicable

Hi @Spencer Kent​ 

We haven't heard from you since the last response from @Kaniz Fatma​ ​, and I was checking back to see if her suggestions helped you.

Or else, If you have any solution, please share it with the community, as it can be helpful to others. 

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

Spencer_Kent
New Contributor III

Unfortunately the suggestion was not helpful—it is no mystery what the error is (insufficient permissions to access the DBFS root location). What remains a mystery, and the point of my question, is whether there is a certain configuration of shared clusters that is required in order to make the DBFS root location accessible. I ask this question here in this forum because I have not yet found in the Databricks documentation a discussion of this question, and because the terraform provider I'm using does not list any cluster configuration options which seem relevant to making sure the DBFS root location (or any DBFS filesystem utils) are available on a shared cluster.

User16623639898
New Contributor II
New Contributor II

Hi @Spencer_Kent ,
Please go through this , https://learn.microsoft.com/en-us/azure/databricks/dbfs/unity-catalog

Shared access mode combines Unity Catalog data governance with Azure Databricks legacy table ACLs. Access to data in the hive_metastore is only available to users that have permissions explicitly granted.

To interact with files directly using DBFS, you must have ANY FILE permissions granted. Because ANY FILE allows users to bypass legacy tables ACLs in the hive_metastore and access all data managed by DBFS, Databricks recommends caution when granting this privilege.

 

Clusters configured with Single User access mode have full access to DBFS, including all files in the DBFS root and mounted data. DBFS root and mounts are available in this access mode, making it the choice for ML workloads that need access to Unity Catalog datasets.

Databricks recommends using service principals with scheduled jobs and Single User access mode for production workloads that need access to data managed by both DBFS and Unity Catalog.

yosinv
New Contributor II

Hello,

I'm encountering a similar issue. We have a team of researchers utilizing a shared cluster without access to the Hive Metastore. I've looked through the documentation, but there doesn't seem to be a way to define or grant "ANY_FILE" during the cluster initialization process.
Moreover, what if Im looking to access the s3 bucket path itself , what is the approach to define it?
please advise

drii_cavalcanti
New Contributor III

I could not find `ANY FILE` permission as well.

Nikhil_G
New Contributor II

There are two ways to grant access to DBFS using ANY FILE:

  1. To User: GRANT SELECT ON ANY FILE TO '<user_mail_id>'
  2. To Group: GRANT SELECT ON ANY FILE TO '<group_name>'"
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.