cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Community Discussions
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

sparklyr::spark_read_csv forbidden 403 error

thethirtyfour
New Contributor III

Hi,

I am trying to read a csv file into a Spark DataFrame using sparklyr::spark_read_csv. I am receiving a 403 access denied error.

I have stored my AWS credentials as environment variables, and can successfully read the file as an R dataframe using arrow::read_csv_arrow. However, spark_read_csv is failing.

 
I have confirmed that I am connected to spark, and can read parquet files stored elsewhere.
 
Any advice?
 
Thanks,
 

my_file <- glue::glue("s3://my-bucket/my-folder/my-file-name.csv")

## This works
mydata <- arrow::read_csv_arrow(
file = my_file
)
## This doesn't
mydata <- sparklyr::spark_read_csv(
sc,
name = "mydata"
file = my_file
)

# Error message
Error : java.nio.file.AccessDeniedException

Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden; request

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @thethirtyfourIt seems youโ€™re encountering a 403 Forbidden error when trying to read a CSV file into a Spark DataFrame using sparklyr::spark_read_csv.

Letโ€™s troubleshoot this issue and explore potential solutions:

  1. IAM Roles vs. AWS Keys:

  2. Check Permissions:

    • Verify that the user running the notebook or script has the necessary permissions to access the S3 bucket.
    • If youโ€™re using Azure Synapse Analytics, consider adding the RBAC Storage Blob Data Contributor role to the user. You can do this after workspace creation2.
  3. File Permissions:

  4. Cache Invalidation:

Hopefully, one of these steps will help resolve the access denied issue. If you continue to encounter problems, feel free to ask for further assistance! ๐Ÿš€