Hello
Background -
I have an S3 datalake set up prior to signing up with Databricks. I'm still in my evaluation period.
I'm trying to read the contents of an S3 bucket but am getting all kinds of permission problems.
Here is the command in the notebook:
dbutils.fs.ls("s3://hidden-bucket-name")
This is the result:
java.nio.file.AccessDeniedException: s3://hidden-bucket-name: getFileStatus on s3://hidden-bucket-name: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied; request: GET https://hidden-bucketname.s3.us-west-1.amazonaws.com {key=[], key=[false], key=[2], key=[2], key=[/]} Hadoop 3.3.4, aws-sdk-java/1.12.390 Linux/5.15.0-1040-aws OpenJDK_64-Bit_Server_VM/25.372-b07 java/1.8.0_372 scala/2.12.15 kotlin/1.6.0 vendor/Azul_Systems,_Inc. cfg/retry-mode/legacy com.amazonaws.services.s3.model.ListObjectsV2Request; Request ID: VAMK134SSX7FM3VA, Extended Request ID: l/PmddVkv7otkOZnhZgeSc2HU9yiKej9ZsJ96xq3gQ+b5uQKDbw8QknQD8zJETYJM78V6jd5K74=, Cloud Provider: AWS, Instance ID: i-0feca026b7707fb3b credentials-provider: com.amazonaws.auth.AnonymousAWSCredentials credential-header: no-credential-header signature-present: false (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: VAMK134SSX7FM3VA; S3 Extended Request ID: l/PmddVkv7otkOZnhZgeSc2HU9yiKej9ZsJ96xq3gQ+b5uQKDbw8QknQD8zJETYJM78V6jd5K74=; Proxy: null), S3 Extended Request ID: l/PmddVkv7otkOZnhZgeSc2HU9yiKej9ZsJ96xq3gQ+b5uQKDbw8QknQD8zJETYJM78V6jd5K74=:AccessDenied
There is a lot going on here. I emphasized things that caught my eye
getFileStatus. There isn't a specific action in the IAM actions for this, so I'm not sure how to remedy this
I followed a lot of articles
Resolutions tried:
https://kb.databricks.com/en_US/security/forbidden-access-to-s3-data
Cause
Below are the common causes:
- AWS keys are used in addition to the IAM role. Using global init scripts to set the AWS keys can cause this behavior.
I do have AWS Keys provisioned for local spark execution against remote s3 buckets. I can’t imagine that this should impact an instance of spark running in a data bricks notebook
- The IAM role has the required permission to access the S3 data, but AWS keys are set in the Spark configuration. For example, setting spark.hadoop.fs.s3a.secret.key can conflict with the IAM role.
See above, I do have this configuration for local spark execution. But as I noted above, there shouldn't be any impact on notebooks running in spark
- Setting AWS keys at environment level on the driver node from an interactive cluster through a notebook.
Not setting keys at environment level
- DBFS mount points were created earlier with AWS keys and now trying to access using an IAM role.
Not applicable.
The Files contained in the bucket were written outside Databricks, but the I followed the steps here: https://docs.databricks.com/en/aws/iam/instance-profile-tutorial.html. As a side note, there is no “Step 7” in the procedure above.
- The IAM role is not attached to the cluster.
Not sure how to attach an IAM role to the data bricks cluster … there is an instance profile attached to the workspace though
- The IAM role with read permission was attached, but you are trying to perform a write operation. That is, the IAM role does not have adequate permission for the operation you are trying to perform.
Not applicable
Solution
Below are the recommendations and best practices to avoid this issue:
- Use IAM roles instead of AWS keys.
Done.
- If you are trying to switch the configuration from AWS keys to IAM roles, unmount the DBFS mount points for S3 buckets created using AWS keys and remount using the IAM role.
Not applicable.
- Avoid using global init script to set AWS keys. Always use a cluster-scoped init script if required.
Not applicable not using global init script in data bricks
- Avoid setting AWS keys in a notebook or cluster Spark configuration.
I do have AWS keys provisioned buy only used when running spark locall
Closing
Access to S3 folders are a blocker to my organization to moving forward with Databricks. I appreciate any resolution the community can provice