cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Access denied error while reading file from S3 to spark

Monika_Bagyal
New Contributor

I'm seeing the access denied error from spark cluster while reading s3 file into notebook.

Running on personal single user compute with LTS 13.3 ML.

configs setup looks like this:

spark.conf.set("spark.hadoop.fs.s3a.access.key", access_id)
spark.conf.set("spark.hadoop.fs.s3a.secret.key", access_key)
spark.conf.set("spark.hadoop.fs.s3a.session.token", session_token)
spark.conf.set("spark.hadoop.fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
spark.conf.set("spark.hadoop.fs.s3a.endpoint", "s3.us-east-1.amazonaws.com")
 
Code block looks like this
file_location = "s3://bucket_name/"
file_type = "parquet"
df = spark.read.format(file_type).load(file_location)
display(df.head())


Error that I'm getting:
java.nio.file.AccessDeniedException: s3://bucket_name/xxx.parquet: getFileStatus ons3://bucket_name/xxx.parquet: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden; request: HEAD https://bucket_name.parquet {} Hadoop 3.3.4, aws-sdk-java/1.12.390 Linux/5.15.0-1045-aws OpenJDK_64-Bit_Server_VM/25.372-b07 java/1.8.0_372 scala/2.12.15 kotlin/1.6.0 vendor/Azul_Systems,_Inc. cfg/retry-mode/legacy com.amazonaws.services.s3.model.GetObjectMetadataRequest; Request ID: RD3ZAB9V0G6C4W7B, Extended Request ID: 7BDXsMzY0O6RwMdKfFLlGuHlw2AkKj0+O2U6vL2UnF1nXzu9sDsVtPVH4qXv5sYzLf8vV65sNdU=, Cloud Provider: AWS, Instance ID: i-06f065a5b0db0e707 credentials-provider: com.amazonaws.auth.AnonymousAWSCredentials credential-header: no-credential-header signature-present: false (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: RD3ZAB9V0G6C4W7B; S3 Extended Request ID: 7BDXsMzY0O6RwMdKfFLlGuHlw2AkKj0+O2U6vL2UnF1nXzu9sDsVtPVH4qXv5sYzLf8vV65sNdU=; Proxy: null), S3 Extended Request ID: 7BDXsMzY0O6RwMdKfFLlGuHlw2AkKj0+O2U6vL2UnF1nXzu9sDsVtPVH4qXv5sYzLf8vV65sNdU=:403 Forbidden
 
 
Please help.
1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @Monika_Bagyal , The "Access Denied" error you are seeing is likely due to insufficient permissions to read the S3 bucket. 

 

The configurations you've set up are correct for accessing S3 using temporary AWS credentials, but the credentials themselves or the permissions associated with those credentials might not have sufficient access to the S3 bucket. 

 

Here are some possible solutions:

 

1. **Check your AWS credentials**: Ensure that the access_id, access_key, and session_token you are using are correct and have not expired.

2. **Check your AWS permissions**: The AWS credentials you are using should have the necessary permissions to read the S3 bucket. You might need to adjust your AWS IAM policies to allow access to the S3 bucket.

3. **Check your bucket policy**: Your S3 bucket policy should allow your AWS credentials to read data. You might need to adjust your bucket policy to allow access.

4. **Check your endpoint**: Make sure the endpoint you are using is correct. It should match the region where your S3 bucket is located.

 

If you've checked all of these and you're still having issues, it might be a more specific issue related to your setup, and you might need to contact Databricks support by filing a support ticket or AWS support for further assistance.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.