cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to read data from S3 Access Point by pyspark?

yutaro_ono1_558
New Contributor II

I want to read data from s3 access point.

I successfully accessed using boto3 client to data through s3 access point.

s3 = boto3.resource('s3')ap = s3.Bucket('arn:aws:s3:[region]:[aws account id]:accesspoint/[S3 Access Point name]')for obj in ap.objects.all():  print(obj.key)  print(obj.get()['Body'].read())

I tried read access through s3 access point by pyspark.

But, I dose not access to s3 access point with error of " java.lang.NullPointerException: null uri host. This can be caused by unencoded / in the password string".

# Can't access to data
# https://[s3-accesspoint-name]-[accountid].s3-accesspoint.[region].amazonaws.com/[file path]
df = spark.read.csv('s3a://arn:aws:s3:[region]:[aws account id]:accesspoint/[S3 access point name]/[data file path]')
df.show()

How to access through the S3 Access Point to data?

S3 Access Point

https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-access-points.html

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @Yutaro Ono​ and @Niclas Ahlqvist Lindqvist​ , The problem is that you have provided the arn instead of the s3 URL. The URL would be something like this (assuming access point is the bucket name):

s3://accesspoint/access-point/prefix/

There is a button in the AWS console if you are in the object or prefix, top right 

Copy S3 URL

View solution in original post

5 REPLIES 5

Kaniz
Community Manager
Community Manager

Hi @yutaro.ono1.558849138444763E12! My name is Kaniz, and I'm a technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response.

Niclas
New Contributor II

Did you get to following up on this issue, Kaniz?

Kaniz
Community Manager
Community Manager

Hi @Yutaro Ono​ and @Niclas Ahlqvist Lindqvist​ , The problem is that you have provided the arn instead of the s3 URL. The URL would be something like this (assuming access point is the bucket name):

s3://accesspoint/access-point/prefix/

There is a button in the AWS console if you are in the object or prefix, top right 

Copy S3 URL

Kaniz
Community Manager
Community Manager

Hi @Niclas Ahlqvist Lindqvist​ and @Yutaro Ono​ , Were you able to resolve your issue with the help of my response?

shrestha-rj
New Contributor II

I'm reaching out to seek assistance as I navigate an issue. Currently, I'm trying to read JSON files from an S3 Multi-Region Access Point using a Databricks notebook. While reading directly from the S3 bucket presents no challenges, I encounter an "java.nio.file.AccessDeniedException" error when attempting to read from the Multi-Region Access Point. Any guidance or support you can provide would be greatly appreciated.

 
spark.read.json("s3://<bucket-name>/").display(). --- No issue
 
spark.read.json("s3://accesspoint/<ap-name>.mrap").display() -- java.nio.file.AccessDeniedException
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.