cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Grant Databricks Access

THIAM_HUATTAN
Valued Contributor

DatabricksQuestionIn the above printscreen of Grant Databricks Access, we see we need to give the rights to a certain Bucket at the highest level.

Why is this so?

Are we able to limit the rights to only certain directories in a bucket, when we need Databricks to have access to certain directories only?

Another more important question, why is there a need to grant Databricks access of everything (add, list, modify, delete) of the AWS account at the root level? Isn't this a security concern? Or do I misinterpret it?

Please help to clarify, thanks.

1 ACCEPTED SOLUTION

Accepted Solutions

Anonymous
Not applicable

@THIAM HUAT TAN​ :

When granting Databricks access to an S3 bucket, it is necessary to provide access at the highest level of the bucket because the bucket itself is the logical unit of access control in S3. This means that permissions can only be granted at the bucket level, and not at the object or directory level.

However, you can still limit access to specific directories within the bucket by using IAM policies that restrict access to specific prefixes (directories) within the bucket. This can be achieved by creating a policy that grants permissions only to specific prefixes and then attaching that policy to the Databricks IAM role.

As for granting Databricks access to everything (add, list, modify, delete) at the root level of the AWS account, this may seem like a security concern at first glance, but it is necessary to enable Databricks to perform its intended tasks. However, it is important to note that Databricks does not automatically have access to everything in the AWS account. Instead, access is granted only to specific resources that are required for Databricks to perform its tasks, such as the S3 bucket that contains the data to be processed.

Furthermore, Databricks has built-in security measures to prevent unauthorized access to resources. For example, it uses IAM roles and policies to restrict access to specific resources, and it encrypts data in transit and at rest to ensure data security. Additionally, Databricks provides auditing and monitoring features that allow you to track access to resources and detect any potential security breaches.

View solution in original post

2 REPLIES 2

Anonymous
Not applicable

@THIAM HUAT TAN​ :

When granting Databricks access to an S3 bucket, it is necessary to provide access at the highest level of the bucket because the bucket itself is the logical unit of access control in S3. This means that permissions can only be granted at the bucket level, and not at the object or directory level.

However, you can still limit access to specific directories within the bucket by using IAM policies that restrict access to specific prefixes (directories) within the bucket. This can be achieved by creating a policy that grants permissions only to specific prefixes and then attaching that policy to the Databricks IAM role.

As for granting Databricks access to everything (add, list, modify, delete) at the root level of the AWS account, this may seem like a security concern at first glance, but it is necessary to enable Databricks to perform its intended tasks. However, it is important to note that Databricks does not automatically have access to everything in the AWS account. Instead, access is granted only to specific resources that are required for Databricks to perform its tasks, such as the S3 bucket that contains the data to be processed.

Furthermore, Databricks has built-in security measures to prevent unauthorized access to resources. For example, it uses IAM roles and policies to restrict access to specific resources, and it encrypts data in transit and at rest to ensure data security. Additionally, Databricks provides auditing and monitoring features that allow you to track access to resources and detect any potential security breaches.

Anonymous
Not applicable

Hi @THIAM HUAT TAN​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.