cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

S3 Permission errors

winojoe
New Contributor III

Hello

 

Background -

I have an S3 datalake set up prior to signing up with Databricks.  I'm still in my evaluation period. 

I'm trying to read the contents of an S3 bucket but am getting all kinds of permission problems.  

Here is the command in the notebook:

dbutils.fs.ls("s3://hidden-bucket-name")

This is the result:

java.nio.file.AccessDeniedException: s3://hidden-bucket-name: getFileStatus on s3://hidden-bucket-name: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied; request: GET https://hidden-bucketname.s3.us-west-1.amazonaws.com  {key=[], key=[false], key=[2], key=[2], key=[/]} Hadoop 3.3.4, aws-sdk-java/1.12.390 Linux/5.15.0-1040-aws OpenJDK_64-Bit_Server_VM/25.372-b07 java/1.8.0_372 scala/2.12.15 kotlin/1.6.0 vendor/Azul_Systems,_Inc. cfg/retry-mode/legacy com.amazonaws.services.s3.model.ListObjectsV2Request; Request ID: VAMK134SSX7FM3VA, Extended Request ID: l/PmddVkv7otkOZnhZgeSc2HU9yiKej9ZsJ96xq3gQ+b5uQKDbw8QknQD8zJETYJM78V6jd5K74=, Cloud Provider: AWS, Instance ID: i-0feca026b7707fb3b credentials-provider: com.amazonaws.auth.AnonymousAWSCredentials credential-header: no-credential-header signature-present: false (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: VAMK134SSX7FM3VA; S3 Extended Request ID: l/PmddVkv7otkOZnhZgeSc2HU9yiKej9ZsJ96xq3gQ+b5uQKDbw8QknQD8zJETYJM78V6jd5K74=; Proxy: null), S3 Extended Request ID: l/PmddVkv7otkOZnhZgeSc2HU9yiKej9ZsJ96xq3gQ+b5uQKDbw8QknQD8zJETYJM78V6jd5K74=:AccessDenied

There is a lot going on here. I emphasized things that caught my eye

getFileStatus.  There isn't a specific action in the IAM actions for this, so I'm not sure how to remedy this

I followed a lot of articles

Resolutions tried:

https://kb.databricks.com/en_US/security/forbidden-access-to-s3-data

Cause

Below are the common causes:

  • AWS keys are used in addition to the IAM role. Using global init scripts to set the AWS keys can cause this behavior.

 I do have AWS Keys provisioned for local spark execution against remote s3 buckets.  I can’t imagine that this should impact an instance of spark running in a data bricks notebook

  • The IAM role has the required permission to access the S3 data, but AWS keys are set in the Spark configuration. For example, setting spark.hadoop.fs.s3a.secret.key can conflict with the IAM role.

See above, I do have this configuration for local spark execution. But as I noted above, there shouldn't be any impact on notebooks running in spark

  • Setting AWS keys at environment level on the driver node from an interactive cluster through a notebook.

Not setting keys at environment level

  • DBFS mount points were created earlier with AWS keys and now trying to access using an IAM role.

Not applicable.

The Files contained in the bucket were written outside Databricks, but the I followed the steps here: https://docs.databricks.com/en/aws/iam/instance-profile-tutorial.htmlAs a side note, there is no “Step 7” in the procedure above.

  • The IAM role is not attached to the cluster.

Not sure how to attach an IAM role to the data bricks cluster … there is an instance profile attached to the workspace though

 

  • The IAM role with read permission was attached, but you are trying to perform a write operation. That is, the IAM role does not have adequate permission for the operation you are trying to perform.

Not applicable

Solution

Below are the recommendations and best practices to avoid this issue:

  • Use IAM roles instead of AWS keys.

Done.

  • If you are trying to switch the configuration from AWS keys to IAM roles, unmount the DBFS mount points for S3 buckets created using AWS keys and remount using the IAM role.

Not applicable.  

  • Avoid using global init script to set AWS keys. Always use a cluster-scoped init script if required.

Not applicable not using global init script in data bricks

  • Avoid setting AWS keys in a notebook or cluster Spark configuration.

I do have AWS keys provisioned buy only used when running spark locall

Closing

Access to S3 folders are a blocker to my organization to moving forward with Databricks.  I appreciate any resolution the community can provice

 

 

 

 

1 REPLY 1

winojoe
New Contributor III

UPDATE:

The permission problems only exist when the Cluster's (compute's) Access mode is "Shared No Isolation".  When the Access Mode is either "Shared" or "Single User" then the IAM configuration seems to apply as expected.  When set to "Shared No Isolation" it's as if the IAM settings are not being applied, and then a bunch of 403 errors are thrown

Also, and this is interesting, the setting for "Instance Profile" can be either "None" or the ARN for the steps 6 described in the link below, it makes no difference.  

 https://docs.databricks.com/en/aws/iam/instance-profile-tutorial.html

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group