2 weeks ago
Context
I'm working on integration patterns between enterprise NAS storage (Amazon FSx for NetApp ONTAP) and Databricks via S3 Access Points. S3 Access Points provide S3 API access to file data without copying — a common pattern for organizations with existing NFS/SMB workloads.
I've documented my findings publicly here: https://github.com/Yoshiki0705/fsxn-lakehouse-integrations
What I've observed
When registering an S3 Access Point as a UC External Location:
The behavior suggests the session policy generated during AssumeRole doesn't correctly handle S3 AP ARN format (arn:aws:s3:REGION:ACCOUNT:accesspoint/NAME).
Technical details
Questions for the community
Has anyone successfully registered an S3 Access Point (not a standard S3 bucket) as a UC External Location? If so, what configuration was needed?
Is there a documented limitation or roadmap item for S3 AP support in UC? I couldn't find this in the current documentation.
For those using the Storage Ecosystem partners (announced at DAIS 2026) — does the native integration bypass this limitation by using a different registration path?
Current workaround
I'm currently using DataSync → standard S3 bucket → UC External Location, which works but introduces data copy. For read-only analytics, Athena and Snowflake can query the S3 AP directly, so this is specifically a UC limitation.
Why this matters to the community
Many organizations store data on enterprise NAS (NFS/SMB) and want to use Databricks for ML/AI without duplicating everything to S3. S3 Access Points are designed exactly for this "access without copy" pattern. If UC could support S3 AP ARNs, it would enable zero-copy governed analytics on enterprise file storage — benefiting anyone with NAS-resident data.
Environment: Databricks on AWS, ap-northeast-1, Premium tier, DBR 16.1+
a week ago
Hey @YoshikiFujiwara , I took a look and have some meaningful feedback for you.
Short version: your diagnosis is right, and what it points to is an unsupported path, not a mistake in your IAM setup. Amazon S3 Access Points are not a supported target for Unity Catalog external locations on AWS today. The current AWS docs only cover external locations against standard S3 bucket paths (s3://...). There's no public doc or release note that lists S3 Access Point ARNs as a supported target, and nothing that describes special configuration for them. The behavior you captured is the known signature of this gap.
Unity Catalog doesn't hand the compute your full IAM role. It uses credential vending (down-scoping). When a query touches the external location, UC calls AWS STS AssumeRole and attaches a session policy scoped to the requested path. Your effective S3 permission is the intersection of two things:
s3:* on the access point ARN.s3://bucket/prefix semantics.That intersection is where it breaks. Standard bucket object operations authorize against arn:aws:s3:::bucket/prefix/*. Access point object operations require a different ARN namespace: arn:aws:s3:<region>:<acct>:accesspoint/<name>/object/<prefix>/*. UC's down-scoped session policy doesn't emit those access point object ARNs, and it scopes ListObjectsV2 to the root prefix only.
That explains each symptom you saw:
ls and explicit single-file reads match the narrow root-prefix scope, so they succeed.ListObjectsV2 that the session policy never grants, so you get UC_CLOUD_STORAGE_ACCESS_FAILURE / UNAUTHORIZED_ACCESS.CREATE TABLE runs an internal write and validation that the session policy denies, so you get AccessDenied.UC validates just enough to accept the location, but the full external-location and table workflow assumes bucket-style addressing, not access point ARN addressing. This is also why Athena, Snowflake, and EMR work against the same access point. They use the role credentials directly (or are access-point aware) and don't impose UC's path-scoped session policy.
If you go looking, you'll find an access_point attribute that injects the AP ARN into the session policy and partially improves things. It's what makes top-level listing and file reads succeed. Don't build on it. Per Databricks Support, that field was never released as GA and has been removed from the documentation. The partial success is a side effect of incomplete internal handling, not a supported code path. It won't get you subdirectory listing or table creation.
Your source is FSx for NetApp ONTAP exposed through an S3 Access Point, so there's no plain S3 bucket underneath to register directly. With that constraint, here's the path I'd take:
cloudFiles) to ingest only new files into UC managed or external tables. That restores the full governance layer (lineage, fine-grained ACLs, row and column masking) the access point path can't give you today.The bottom line: no IAM tweak will fix this, because the block is in UC's session-policy generation, not your role. Until S3 Access Points are a supported external-location target, standard S3 with Auto Loader into UC tables is the durable, fully governed pattern.
Cheers, Louis.
a week ago
Hey @YoshikiFujiwara , I took a look and have some meaningful feedback for you.
Short version: your diagnosis is right, and what it points to is an unsupported path, not a mistake in your IAM setup. Amazon S3 Access Points are not a supported target for Unity Catalog external locations on AWS today. The current AWS docs only cover external locations against standard S3 bucket paths (s3://...). There's no public doc or release note that lists S3 Access Point ARNs as a supported target, and nothing that describes special configuration for them. The behavior you captured is the known signature of this gap.
Unity Catalog doesn't hand the compute your full IAM role. It uses credential vending (down-scoping). When a query touches the external location, UC calls AWS STS AssumeRole and attaches a session policy scoped to the requested path. Your effective S3 permission is the intersection of two things:
s3:* on the access point ARN.s3://bucket/prefix semantics.That intersection is where it breaks. Standard bucket object operations authorize against arn:aws:s3:::bucket/prefix/*. Access point object operations require a different ARN namespace: arn:aws:s3:<region>:<acct>:accesspoint/<name>/object/<prefix>/*. UC's down-scoped session policy doesn't emit those access point object ARNs, and it scopes ListObjectsV2 to the root prefix only.
That explains each symptom you saw:
ls and explicit single-file reads match the narrow root-prefix scope, so they succeed.ListObjectsV2 that the session policy never grants, so you get UC_CLOUD_STORAGE_ACCESS_FAILURE / UNAUTHORIZED_ACCESS.CREATE TABLE runs an internal write and validation that the session policy denies, so you get AccessDenied.UC validates just enough to accept the location, but the full external-location and table workflow assumes bucket-style addressing, not access point ARN addressing. This is also why Athena, Snowflake, and EMR work against the same access point. They use the role credentials directly (or are access-point aware) and don't impose UC's path-scoped session policy.
If you go looking, you'll find an access_point attribute that injects the AP ARN into the session policy and partially improves things. It's what makes top-level listing and file reads succeed. Don't build on it. Per Databricks Support, that field was never released as GA and has been removed from the documentation. The partial success is a side effect of incomplete internal handling, not a supported code path. It won't get you subdirectory listing or table creation.
Your source is FSx for NetApp ONTAP exposed through an S3 Access Point, so there's no plain S3 bucket underneath to register directly. With that constraint, here's the path I'd take:
cloudFiles) to ingest only new files into UC managed or external tables. That restores the full governance layer (lineage, fine-grained ACLs, row and column masking) the access point path can't give you today.The bottom line: no IAM tweak will fix this, because the block is in UC's session-policy generation, not your role. Until S3 Access Points are a supported external-location target, standard S3 with Auto Loader into UC tables is the durable, fully governed pattern.
Cheers, Louis.
Tuesday
Thank you @Louis_Frolio — this is exactly the clarity I was looking for. Your explanation of how UC's session policy generates arn:aws:s3:::bucket/prefix/* while Access Points require the arn:aws:s3:<region>:<acct>:accesspoint/<name>/object/<prefix>/* namespace confirms the root cause we couldn't verify without internal context.
A few follow-ups:
1. Feature request filed: I've opened a case with our Databricks account team referencing this thread and the repro evidence. Hopefully it helps prioritize.
2. Incremental staging: Your DataSync → Auto Loader suggestion is what we're running now. For others reading: ONTAP FPolicy (file event notification) → SQS → Lambda can also trigger incremental ingestion without full-directory scans — useful when the source has millions of files but few changes per hour.
3. OpenSharing path: One adjacent development — OpenSharing (announced at DAIS as the Delta Sharing evolution under the Linux Foundation) defines a credential vending model where the server issues scoped STS credentials directly. The recipient calls standard S3 APIs with those credentials, which operates independently of UC's credential vending path. I validated reads against the same FSx S3 AP via this pattern. Note that this is read-only and outside UC governance, but for cross-platform sharing it may complement the UC path until native support arrives. Details in the repo linked above.
Thanks again for confirming this is a product gap, not misconfiguration. That helps us architect the right workarounds.
Tuesday - last edited Tuesday
Following this, I'm curious if anyone has gotten this working.