cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Unable to access AWS S3 - Error : java.nio.file.AccessDeniedException

Madhawa
New Contributor II

Reading file like this "Data = spark.sql("SELECT * FROM edge.inv.rm") 

Getting this error 

org.apache.spark.SparkException: Job aborted due to stage failure: Task 10 in stage 441.0 failed 4 times, most recent failure: Lost task 10.3 in stage 441.0 (TID 204) (XX.XX.X.XX executor 0): com.databricks.sql.io.FileReadException: Error while reading file s3://edge-dataproducts-s3/inv/RM/part-002-c7e-12-8d9-fa1.c0.snappy.parquet.                                                                                                                      

Caused by: java.nio.file.AccessDeniedException: s3a://edge-dataproducts-s3/inv/RM/part-002-c7e-12-8d9-fa1.c0.snappy.parquet   

 

Tried different clusters types, spark run time versions, still getting the same error.

Any suggestions to solve this error ?

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz_Fatma
Community Manager
Community Manager

Hi @Madhawa

  • Ensure that the AWS credentials (access key and secret key) are correctly configured in your Spark application. You can set them using spark.conf.set("spark.hadoop.fs.s3a.access.key", "your_access_key") and spark.conf.set("spark.hadoop.fs.s3a.secret.key", "your_secret_key").
  • Verify that the IAM user associated with these credentials has the necessary permissions to read from the S3 bucket.
  • Set the S3 endpoint correctly using spark.conf.set("spark.hadoop.fs.s3a.endpoint", "your_s3_endpoint"). Replace "your_s3_endpoint" with the actual endpoint for your S3 region (e.g., "s3.amazonaws.com").
  • Make sure the region matches the S3 bucketโ€™s region.
  • Double-check the file path: s3a://edge-dataproducts-s3/inv/RM/part-002-c7e-12-8d9-fa1.c0.snappy.parquet. Ensure that the file exists in the specified location.
  • Verify that the bucket name, folder structure, and file name are accurate.
  • If youโ€™re using an EMR cluster, ensure that the clusterโ€™s security group allows outbound traffic to S3.
  • Check if there are any network-related issues (firewalls, VPC settings, etc.).
  • Consider upgrading Spark to the latest version (if possible) to benefit from bug fixes and improvements related to S3 interactions.

If youโ€™ve tried all these steps and still face issues, please provide additional details, and weโ€™ll continue troubleshooting! 

For more information, you can refer to this Stack Overflow thread.

View solution in original post

2 REPLIES 2

Kaniz_Fatma
Community Manager
Community Manager

Hi @Madhawa

  • Ensure that the AWS credentials (access key and secret key) are correctly configured in your Spark application. You can set them using spark.conf.set("spark.hadoop.fs.s3a.access.key", "your_access_key") and spark.conf.set("spark.hadoop.fs.s3a.secret.key", "your_secret_key").
  • Verify that the IAM user associated with these credentials has the necessary permissions to read from the S3 bucket.
  • Set the S3 endpoint correctly using spark.conf.set("spark.hadoop.fs.s3a.endpoint", "your_s3_endpoint"). Replace "your_s3_endpoint" with the actual endpoint for your S3 region (e.g., "s3.amazonaws.com").
  • Make sure the region matches the S3 bucketโ€™s region.
  • Double-check the file path: s3a://edge-dataproducts-s3/inv/RM/part-002-c7e-12-8d9-fa1.c0.snappy.parquet. Ensure that the file exists in the specified location.
  • Verify that the bucket name, folder structure, and file name are accurate.
  • If youโ€™re using an EMR cluster, ensure that the clusterโ€™s security group allows outbound traffic to S3.
  • Check if there are any network-related issues (firewalls, VPC settings, etc.).
  • Consider upgrading Spark to the latest version (if possible) to benefit from bug fixes and improvements related to S3 interactions.

If youโ€™ve tried all these steps and still face issues, please provide additional details, and weโ€™ll continue troubleshooting! 

For more information, you can refer to this Stack Overflow thread.

Madhawa
New Contributor II

My concern is that sometimes I am able to run it without any errors, and other times I get this error. Why is that?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group