cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Users are failing query data from S3 bucket

164079
Contributor II

Hi team,

Users are unable run select on data located on S3 buckets, S3 permission are ok.

The only way they manage do it by granted the databricks workspace admin permission.

Attached the error.

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions

Pat
Honored Contributor III

Hmm,

let's try something like this.

When you create new cluster you can click on the `UI Preview` and `Legacy UI is enabled`

image 

chose Cluster mode: High Concurrency

in Advanced Options:

Table Access Control - Enable:

image 

on the right side you can switch to JSON and see what I have:

{
    "autoscale": {
        "min_workers": 2,
        "max_workers": 8
    },
    "cluster_name": "Pat's Cluster",
    "spark_version": "10.4.x-scala2.12",
    "spark_conf": {
        "spark.databricks.cluster.profile": "serverless",
        "spark.databricks.repl.allowedLanguages": "python,sql",
        "spark.databricks.acl.dfAclsEnabled": "true"
    },
    "aws_attributes": {
        "first_on_demand": 1,
        "availability": "SPOT_WITH_FALLBACK",
        "zone_id": "auto",
        "spot_bid_price_percent": 100,
        "ebs_volume_type": null,
        "ebs_volume_count": null,
        "ebs_volume_size": null
    },
    "node_type_id": "i3.2xlarge",
    "ssh_public_keys": [],
    "custom_tags": {
        "ResourceClass": "Serverless"
    },
    "spark_env_vars": {
        "PYSPARK_PYTHON": "/databricks/python3/bin/python3"
    },
    "autotermination_minutes": 0,
    "enable_elastic_disk": true,
    "cluster_source": "UI",
    "init_scripts": [],
    "data_security_mode": null,
    "runtime_engine": "STANDARD"
}

View solution in original post

13 REPLIES 13

Pat
Honored Contributor III

Hi @Avi Edri​ ,

it looks like your users are missing SELECT ANY FILE permission (which admins are granted by default), please see here for more details:

https://docs.databricks.com/security/access-control/table-acls/object-privileges.html

image 

I assume you are not using Unity Catalog. It's not easy to achieve both without Unity Catalog - access data through Tables and through file path (spark.read... ).

You might need to re-visit data access on your side. I do believe when you have Table Access Control enabled cluster then you are limited to use tables - select * from some_table, unless you have permission to SELECT ANY FILE, then you can bypass this restriction.

Unity Catalog is way forward, it enables more security and allows some flexibility here.

thanks,

Patryk.

164079
Contributor II

Thank you Pat,

Can you please guide me how do i grant the ANY FILE permission to my users or groups?

Also is there a way grant select to all db's via mysql command or terminal?

we are not using unity catalog and our table permission policy is enabled

Avi

Pat
Honored Contributor III

This will work with sql (notebook):

GRANT SELECT ON ANY FILE TO `group-name`

or maybe this with terraform:

resource "databricks_sql_permissions" "any_file" {
  any_file = true
 
 
  privilege_assignments {
    principal  = "group-name"
    privileges = ["SELECT"]
  }
 
}

I didn't try the terraform one.

https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/sql_permissions#...

thanks,

Pat.

164079
Contributor II

Thanks you Pat,

When running this query im getting the below exception:

rror in SQL statement: SparkException: Trying to perform permission action on Hive Metastore /ANY_FILE but Table Access Control is not enabled on this cluster.

I verified and my workspace settings are with enable table access control is enabled.

image

Pat
Honored Contributor III

You need to use cluster with TAC enabled.

https://docs.databricks.com/security/access-control/table-acls/table-acl.html

there were some changes to the UI recently, you can follow instructions here.

https://docs.databricks.com/clusters/cluster-ui-preview.html

thanks,

Pat.

164079
Contributor II

Hi @Pat,

Im getting this error after adding this config to my global init script:

spark.databricks.acl.sqlOnly true

This is the error:

imageThank you!

Pat
Honored Contributor III

Hmm,

let's try something like this.

When you create new cluster you can click on the `UI Preview` and `Legacy UI is enabled`

image 

chose Cluster mode: High Concurrency

in Advanced Options:

Table Access Control - Enable:

image 

on the right side you can switch to JSON and see what I have:

{
    "autoscale": {
        "min_workers": 2,
        "max_workers": 8
    },
    "cluster_name": "Pat's Cluster",
    "spark_version": "10.4.x-scala2.12",
    "spark_conf": {
        "spark.databricks.cluster.profile": "serverless",
        "spark.databricks.repl.allowedLanguages": "python,sql",
        "spark.databricks.acl.dfAclsEnabled": "true"
    },
    "aws_attributes": {
        "first_on_demand": 1,
        "availability": "SPOT_WITH_FALLBACK",
        "zone_id": "auto",
        "spot_bid_price_percent": 100,
        "ebs_volume_type": null,
        "ebs_volume_count": null,
        "ebs_volume_size": null
    },
    "node_type_id": "i3.2xlarge",
    "ssh_public_keys": [],
    "custom_tags": {
        "ResourceClass": "Serverless"
    },
    "spark_env_vars": {
        "PYSPARK_PYTHON": "/databricks/python3/bin/python3"
    },
    "autotermination_minutes": 0,
    "enable_elastic_disk": true,
    "cluster_source": "UI",
    "init_scripts": [],
    "data_security_mode": null,
    "runtime_engine": "STANDARD"
}

164079
Contributor II

Thnaks Pat,

Yes it worked!

image 

is there a command to show grants for user or group?

Pat
Honored Contributor III

I think that you always need to add SECURABLE_OBJECT

https://docs.databricks.com/sql/language-manual/security-show-grant.html

SHOW GRANTS [ principal ] ON securable_object

164079
Contributor II

Great,

Thank you Pat!

karthik_p
Esteemed Contributor

@Avi Edri​ adding some more info to @Pat Sienkiewicz​ suggestion, @Avi Edri​ are you using cluster with instance profile, if you are using instance profile configured, please validate read permissions are there on that bucket and instance profile assigned cluster is enabled for user

Hi @karthik p​ 

Yes, all relevant S3 bucket permission for this user is set

Thanks!

Thanks ! @Kaniz Fatma​ 

Will update as soon as my issue resolved.

Avi

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group