cancel
Showing results for 
Search instead for 
Did you mean: 
Community Discussions
cancel
Showing results for 
Search instead for 
Did you mean: 

sparklyr::spark_read_csv forbidden 403 error

thethirtyfour
New Contributor III

Hi,

I am trying to read a csv file into a Spark DataFrame using sparklyr::spark_read_csv. I am receiving a 403 access denied error.

I have stored my AWS credentials as environment variables, and can successfully read the file as an R dataframe using arrow::read_csv_arrow. However, spark_read_csv is failing.

 
I have confirmed that I am connected to spark, and can read parquet files stored elsewhere.
 
Any advice?
 
Thanks,
 

my_file <- glue::glue("s3://my-bucket/my-folder/my-file-name.csv")

## This works
mydata <- arrow::read_csv_arrow(
file = my_file
)
## This doesn't
mydata <- sparklyr::spark_read_csv(
sc,
name = "mydata"
file = my_file
)

# Error message
Error : java.nio.file.AccessDeniedException

Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden; request

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @thethirtyfourIt seems you’re encountering a 403 Forbidden error when trying to read a CSV file into a Spark DataFrame using sparklyr::spark_read_csv.

Let’s troubleshoot this issue and explore potential solutions:

  1. IAM Roles vs. AWS Keys:

  2. Check Permissions:

    • Verify that the user running the notebook or script has the necessary permissions to access the S3 bucket.
    • If you’re using Azure Synapse Analytics, consider adding the RBAC Storage Blob Data Contributor role to the user. You can do this after workspace creation2.
  3. File Permissions:

  4. Cache Invalidation:

Hopefully, one of these steps will help resolve the access denied issue. If you continue to encounter problems, feel free to ask for further assistance! 🚀