How to read data from S3 Access Point by pyspark?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-17-2021 04:07 PM
I want to read data from s3 access point.
I successfully accessed using boto3 client to data through s3 access point.
s3 = boto3.resource('s3')ap = s3.Bucket('arn:aws:s3:[region]:[aws account id]:accesspoint/[S3 Access Point name]')for obj in ap.objects.all(): print(obj.key) print(obj.get()['Body'].read())
I tried read access through s3 access point by pyspark.
But, I dose not access to s3 access point with error of " java.lang.NullPointerException: null uri host. This can be caused by unencoded / in the password string".
# Can't access to data
# https://[s3-accesspoint-name]-[accountid].s3-accesspoint.[region].amazonaws.com/[file path]
df = spark.read.csv('s3a://arn:aws:s3:[region]:[aws account id]:accesspoint/[S3 access point name]/[data file path]')
df.show()
How to access through the S3 Access Point to data?
S3 Access Point
https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-access-points.html
- Labels:
-
Mount point data lake
-
Pyspark
-
Read from s3
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-21-2022 12:58 AM
Did you get to following up on this issue, Kaniz?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-10-2023 11:48 AM
I'm reaching out to seek assistance as I navigate an issue. Currently, I'm trying to read JSON files from an S3 Multi-Region Access Point using a Databricks notebook. While reading directly from the S3 bucket presents no challenges, I encounter an "java.nio.file.AccessDeniedException" error when attempting to read from the Multi-Region Access Point. Any guidance or support you can provide would be greatly appreciated.