cancel
Showing results for 
Search instead for 
Did you mean: 
Community Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

Unable to access AWS S3 - Error : java.nio.file.AccessDeniedException

Madhawa
New Contributor II

Reading file like this "Data = spark.sql("SELECT * FROM edge.inv.rm") 

Getting this error 

org.apache.spark.SparkException: Job aborted due to stage failure: Task 10 in stage 441.0 failed 4 times, most recent failure: Lost task 10.3 in stage 441.0 (TID 204) (XX.XX.X.XX executor 0): com.databricks.sql.io.FileReadException: Error while reading file s3://edge-dataproducts-s3/inv/RM/part-002-c7e-12-8d9-fa1.c0.snappy.parquet.                                                                                                                      

Caused by: java.nio.file.AccessDeniedException: s3a://edge-dataproducts-s3/inv/RM/part-002-c7e-12-8d9-fa1.c0.snappy.parquet   

 

Tried different clusters types, spark run time versions, still getting the same error.

Any suggestions to solve this error ?

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz_Fatma
Community Manager
Community Manager

Hi @Madhawa

  • Ensure that the AWS credentials (access key and secret key) are correctly configured in your Spark application. You can set them using spark.conf.set("spark.hadoop.fs.s3a.access.key", "your_access_key") and spark.conf.set("spark.hadoop.fs.s3a.secret.key", "your_secret_key").
  • Verify that the IAM user associated with these credentials has the necessary permissions to read from the S3 bucket.
  • Set the S3 endpoint correctly using spark.conf.set("spark.hadoop.fs.s3a.endpoint", "your_s3_endpoint"). Replace "your_s3_endpoint" with the actual endpoint for your S3 region (e.g., "s3.amazonaws.com").
  • Make sure the region matches the S3 bucket’s region.
  • Double-check the file path: s3a://edge-dataproducts-s3/inv/RM/part-002-c7e-12-8d9-fa1.c0.snappy.parquet. Ensure that the file exists in the specified location.
  • Verify that the bucket name, folder structure, and file name are accurate.
  • If you’re using an EMR cluster, ensure that the cluster’s security group allows outbound traffic to S3.
  • Check if there are any network-related issues (firewalls, VPC settings, etc.).
  • Consider upgrading Spark to the latest version (if possible) to benefit from bug fixes and improvements related to S3 interactions.

If you’ve tried all these steps and still face issues, please provide additional details, and we’ll continue troubleshooting! 

For more information, you can refer to this Stack Overflow thread.

View solution in original post

2 REPLIES 2

Kaniz_Fatma
Community Manager
Community Manager

Hi @Madhawa

  • Ensure that the AWS credentials (access key and secret key) are correctly configured in your Spark application. You can set them using spark.conf.set("spark.hadoop.fs.s3a.access.key", "your_access_key") and spark.conf.set("spark.hadoop.fs.s3a.secret.key", "your_secret_key").
  • Verify that the IAM user associated with these credentials has the necessary permissions to read from the S3 bucket.
  • Set the S3 endpoint correctly using spark.conf.set("spark.hadoop.fs.s3a.endpoint", "your_s3_endpoint"). Replace "your_s3_endpoint" with the actual endpoint for your S3 region (e.g., "s3.amazonaws.com").
  • Make sure the region matches the S3 bucket’s region.
  • Double-check the file path: s3a://edge-dataproducts-s3/inv/RM/part-002-c7e-12-8d9-fa1.c0.snappy.parquet. Ensure that the file exists in the specified location.
  • Verify that the bucket name, folder structure, and file name are accurate.
  • If you’re using an EMR cluster, ensure that the cluster’s security group allows outbound traffic to S3.
  • Check if there are any network-related issues (firewalls, VPC settings, etc.).
  • Consider upgrading Spark to the latest version (if possible) to benefit from bug fixes and improvements related to S3 interactions.

If you’ve tried all these steps and still face issues, please provide additional details, and we’ll continue troubleshooting! 

For more information, you can refer to this Stack Overflow thread.

Madhawa
New Contributor II

My concern is that sometimes I am able to run it without any errors, and other times I get this error. Why is that?

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!