Databricks Community

YoshikiFujiwara · 2 weeks ago

Context

I'm working on integration patterns between enterprise NAS storage (Amazon FSx for NetApp ONTAP) and Databricks via S3 Access Points. S3 Access Points provide S3 API access to file data without copying — a common pattern for organizations with existing NFS/SMB workloads.

I've documented my findings publicly here: https://github.com/Yoshiki0705/fsxn-lakehouse-integrations

What I've observed

When registering an S3 Access Point as a UC External Location:

Yes: External Location creation succeeds
Yes: Top-level file listing works
Yes: Explicit file reads (specifying full path) work
No: Subdirectory listing fails with UC_CLOUD_STORAGE_ACCESS_FAILURE
No: CREATE TABLE fails with AccessDenied

The behavior suggests the session policy generated during AssumeRole doesn't correctly handle S3 AP ARN format (arn:aws:s3:REGION:ACCOUNT:accesspoint/NAME).

Technical details

Databricks on AWS, Premium tier
Unity Catalog enabled
Storage Credential: IAM Role with full s3:* on the AP ARN
S3 AP type: Internet-origin (same as what works with Athena/Snowflake/EMR)
The same data is queryable via Athena, Snowflake External Stage, and EMR Serverless without issues

Questions for the community

Has anyone successfully registered an S3 Access Point (not a standard S3 bucket) as a UC External Location? If so, what configuration was needed?
Is there a documented limitation or roadmap item for S3 AP support in UC? I couldn't find this in the current documentation.
For those using the Storage Ecosystem partners (announced at DAIS 2026) — does the native integration bypass this limitation by using a different registration path?

Current workaround

I'm currently using DataSync → standard S3 bucket → UC External Location, which works but introduces data copy. For read-only analytics, Athena and Snowflake can query the S3 AP directly, so this is specifically a UC limitation.

Why this matters to the community

Many organizations store data on enterprise NAS (NFS/SMB) and want to use Databricks for ML/AI without duplicating everything to S3. S3 Access Points are designed exactly for this "access without copy" pattern. If UC could support S3 AP ARNs, it would enable zero-copy governed analytics on enterprise file storage — benefiting anyone with NAS-resident data.

Environment: Databricks on AWS, ap-northeast-1, Premium tier, DBR 16.1+

Louis_Frolio · a week ago

Hey @YoshikiFujiwara , I took a look and have some meaningful feedback for you.

Short version: your diagnosis is right, and what it points to is an unsupported path, not a mistake in your IAM setup. Amazon S3 Access Points are not a supported target for Unity Catalog external locations on AWS today. The current AWS docs only cover external locations against standard S3 bucket paths (s3://...). There's no public doc or release note that lists S3 Access Point ARNs as a supported target, and nothing that describes special configuration for them. The behavior you captured is the known signature of this gap.

Why it behaves this way

Unity Catalog doesn't hand the compute your full IAM role. It uses credential vending (down-scoping). When a query touches the external location, UC calls AWS STS AssumeRole and attaches a session policy scoped to the requested path. Your effective S3 permission is the intersection of two things:

Your IAM role's identity policy, which you've correctly set to s3:* on the access point ARN.
UC's generated session policy, which is built from standard s3://bucket/prefix semantics.

That intersection is where it breaks. Standard bucket object operations authorize against arn:aws:s3:::bucket/prefix/*. Access point object operations require a different ARN namespace: arn:aws:s3:<region>:<acct>:accesspoint/<name>/object/<prefix>/*. UC's down-scoped session policy doesn't emit those access point object ARNs, and it scopes ListObjectsV2 to the root prefix only.

That explains each symptom you saw:

Top-level ls and explicit single-file reads match the narrow root-prefix scope, so they succeed.
Subdirectory listing needs prefix-level ListObjectsV2 that the session policy never grants, so you get UC_CLOUD_STORAGE_ACCESS_FAILURE / UNAUTHORIZED_ACCESS.
CREATE TABLE runs an internal write and validation that the session policy denies, so you get AccessDenied.

UC validates just enough to accept the location, but the full external-location and table workflow assumes bucket-style addressing, not access point ARN addressing. This is also why Athena, Snowflake, and EMR work against the same access point. They use the role credentials directly (or are access-point aware) and don't impose UC's path-scoped session policy.

A caution about the access_point field

If you go looking, you'll find an access_point attribute that injects the AP ARN into the session policy and partially improves things. It's what makes top-level listing and file reads succeed. Don't build on it. Per Databricks Support, that field was never released as GA and has been removed from the documentation. The partial success is a side effect of incomplete internal handling, not a supported code path. It won't get you subdirectory listing or table creation.

What I'd do from here

Your source is FSx for NetApp ONTAP exposed through an S3 Access Point, so there's no plain S3 bucket underneath to register directly. With that constraint, here's the path I'd take:

Keep the AWS-native engines for in-place reads. Athena, Snowflake, and EMR are fine wherever you don't need UC governance.
Stage into standard S3, then govern in UC. This is your DataSync workaround, refined. To address the duplication concern, make it incremental instead of a full copy: land data in a standard S3 bucket and use Auto Loader (cloudFiles) to ingest only new files into UC managed or external tables. That restores the full governance layer (lineage, fine-grained ACLs, row and column masking) the access point path can't give you today.
File a feature request with your Databricks account team for native S3 Access Point support in UC credential vending. Attach the repro details you've already collected and track it under a support case. This is a real product gap, not user error.

The bottom line: no IAM tweak will fix this, because the block is in UC's session-policy generation, not your role. Until S3 Access Points are a supported external-location target, standard S3 with Auto Loader into UC tables is the durable, fully governed pattern.

Cheers, Louis.

View solution in original post

Louis_Frolio · a week ago

Hey @YoshikiFujiwara , I took a look and have some meaningful feedback for you.

Short version: your diagnosis is right, and what it points to is an unsupported path, not a mistake in your IAM setup. Amazon S3 Access Points are not a supported target for Unity Catalog external locations on AWS today. The current AWS docs only cover external locations against standard S3 bucket paths (s3://...). There's no public doc or release note that lists S3 Access Point ARNs as a supported target, and nothing that describes special configuration for them. The behavior you captured is the known signature of this gap.

Why it behaves this way

Unity Catalog doesn't hand the compute your full IAM role. It uses credential vending (down-scoping). When a query touches the external location, UC calls AWS STS AssumeRole and attaches a session policy scoped to the requested path. Your effective S3 permission is the intersection of two things:

Your IAM role's identity policy, which you've correctly set to s3:* on the access point ARN.
UC's generated session policy, which is built from standard s3://bucket/prefix semantics.

That intersection is where it breaks. Standard bucket object operations authorize against arn:aws:s3:::bucket/prefix/*. Access point object operations require a different ARN namespace: arn:aws:s3:<region>:<acct>:accesspoint/<name>/object/<prefix>/*. UC's down-scoped session policy doesn't emit those access point object ARNs, and it scopes ListObjectsV2 to the root prefix only.

That explains each symptom you saw:

Top-level ls and explicit single-file reads match the narrow root-prefix scope, so they succeed.
Subdirectory listing needs prefix-level ListObjectsV2 that the session policy never grants, so you get UC_CLOUD_STORAGE_ACCESS_FAILURE / UNAUTHORIZED_ACCESS.
CREATE TABLE runs an internal write and validation that the session policy denies, so you get AccessDenied.

UC validates just enough to accept the location, but the full external-location and table workflow assumes bucket-style addressing, not access point ARN addressing. This is also why Athena, Snowflake, and EMR work against the same access point. They use the role credentials directly (or are access-point aware) and don't impose UC's path-scoped session policy.

A caution about the access_point field

If you go looking, you'll find an access_point attribute that injects the AP ARN into the session policy and partially improves things. It's what makes top-level listing and file reads succeed. Don't build on it. Per Databricks Support, that field was never released as GA and has been removed from the documentation. The partial success is a side effect of incomplete internal handling, not a supported code path. It won't get you subdirectory listing or table creation.

What I'd do from here

Your source is FSx for NetApp ONTAP exposed through an S3 Access Point, so there's no plain S3 bucket underneath to register directly. With that constraint, here's the path I'd take:

Keep the AWS-native engines for in-place reads. Athena, Snowflake, and EMR are fine wherever you don't need UC governance.
Stage into standard S3, then govern in UC. This is your DataSync workaround, refined. To address the duplication concern, make it incremental instead of a full copy: land data in a standard S3 bucket and use Auto Loader (cloudFiles) to ingest only new files into UC managed or external tables. That restores the full governance layer (lineage, fine-grained ACLs, row and column masking) the access point path can't give you today.
File a feature request with your Databricks account team for native S3 Access Point support in UC credential vending. Attach the repro details you've already collected and track it under a support case. This is a real product gap, not user error.

The bottom line: no IAM tweak will fix this, because the block is in UC's session-policy generation, not your role. Until S3 Access Points are a supported external-location target, standard S3 with Auto Loader into UC tables is the durable, fully governed pattern.

Cheers, Louis.

YoshikiFujiwara · Tuesday

Thank you @Louis_Frolio — this is exactly the clarity I was looking for. Your explanation of how UC's session policy generates arn:aws:s3:::bucket/prefix/* while Access Points require the arn:aws:s3:<region>:<acct>:accesspoint/<name>/object/<prefix>/* namespace confirms the root cause we couldn't verify without internal context.

A few follow-ups:

1. Feature request filed: I've opened a case with our Databricks account team referencing this thread and the repro evidence. Hopefully it helps prioritize.

2. Incremental staging: Your DataSync → Auto Loader suggestion is what we're running now. For others reading: ONTAP FPolicy (file event notification) → SQS → Lambda can also trigger incremental ingestion without full-directory scans — useful when the source has millions of files but few changes per hour.

3. OpenSharing path: One adjacent development — OpenSharing (announced at DAIS as the Delta Sharing evolution under the Linux Foundation) defines a credential vending model where the server issues scoped STS credentials directly. The recipient calls standard S3 APIs with those credentials, which operates independently of UC's credential vending path. I validated reads against the same FSx S3 AP via this pattern. Note that this is read-only and outside UC governance, but for cross-platform sharing it may complement the UC path until native support arrives. Details in the repo linked above.

Thanks again for confirming this is a product gap, not misconfiguration. That helps us architect the right workarounds.