cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Unity Catalog External Location with Amazon S3 Access Points,session policy behavior and workarounds

YoshikiFujiwara
New Contributor II

Context

I'm working on integration patterns between enterprise NAS storage (Amazon FSx for NetApp ONTAP) and Databricks via S3 Access Points. S3 Access Points provide S3 API access to file data without copying โ€” a common pattern for organizations with existing NFS/SMB workloads.

I've documented my findings publicly here: https://github.com/Yoshiki0705/fsxn-lakehouse-integrations

What I've observed

When registering an S3 Access Point as a UC External Location:

  • Yes: External Location creation succeeds
  • Yes: Top-level file listing works
  • Yes: Explicit file reads (specifying full path) work
  • No: Subdirectory listing fails with UC_CLOUD_STORAGE_ACCESS_FAILURE
  • No: CREATE TABLE fails with AccessDenied

The behavior suggests the session policy generated during AssumeRole doesn't correctly handle S3 AP ARN format (arn:aws:s3:REGION:ACCOUNT:accesspoint/NAME).

Technical details

  • Databricks on AWS, Premium tier
  • Unity Catalog enabled
  • Storage Credential: IAM Role with full s3:* on the AP ARN
  • S3 AP type: Internet-origin (same as what works with Athena/Snowflake/EMR)
  • The same data is queryable via Athena, Snowflake External Stage, and EMR Serverless without issues

Questions for the community

  1. Has anyone successfully registered an S3 Access Point (not a standard S3 bucket) as a UC External Location? If so, what configuration was needed?

  2. Is there a documented limitation or roadmap item for S3 AP support in UC? I couldn't find this in the current documentation.

  3. For those using the Storage Ecosystem partners (announced at DAIS 2026) โ€” does the native integration bypass this limitation by using a different registration path?

Current workaround

I'm currently using DataSync โ†’ standard S3 bucket โ†’ UC External Location, which works but introduces data copy. For read-only analytics, Athena and Snowflake can query the S3 AP directly, so this is specifically a UC limitation.

Why this matters to the community

Many organizations store data on enterprise NAS (NFS/SMB) and want to use Databricks for ML/AI without duplicating everything to S3. S3 Access Points are designed exactly for this "access without copy" pattern. If UC could support S3 AP ARNs, it would enable zero-copy governed analytics on enterprise file storage โ€” benefiting anyone with NAS-resident data.

Environment: Databricks on AWS, ap-northeast-1, Premium tier, DBR 16.1+

1 REPLY 1

Louis_Frolio
Databricks Employee
Databricks Employee

Hey @YoshikiFujiwara , I took a look and have some meaningful feedback for you.

Short version: your diagnosis is right, and what it points to is an unsupported path, not a mistake in your IAM setup. Amazon S3 Access Points are not a supported target for Unity Catalog external locations on AWS today. The current AWS docs only cover external locations against standard S3 bucket paths (s3://...). There's no public doc or release note that lists S3 Access Point ARNs as a supported target, and nothing that describes special configuration for them. The behavior you captured is the known signature of this gap.

Why it behaves this way

Unity Catalog doesn't hand the compute your full IAM role. It uses credential vending (down-scoping). When a query touches the external location, UC calls AWS STS AssumeRole and attaches a session policy scoped to the requested path. Your effective S3 permission is the intersection of two things:

  1. Your IAM role's identity policy, which you've correctly set to s3:* on the access point ARN.
  2. UC's generated session policy, which is built from standard s3://bucket/prefix semantics.

That intersection is where it breaks. Standard bucket object operations authorize against arn:aws:s3:::bucket/prefix/*. Access point object operations require a different ARN namespace: arn:aws:s3:<region>:<acct>:accesspoint/<name>/object/<prefix>/*. UC's down-scoped session policy doesn't emit those access point object ARNs, and it scopes ListObjectsV2 to the root prefix only.

That explains each symptom you saw:

  • Top-level ls and explicit single-file reads match the narrow root-prefix scope, so they succeed.
  • Subdirectory listing needs prefix-level ListObjectsV2 that the session policy never grants, so you get UC_CLOUD_STORAGE_ACCESS_FAILURE / UNAUTHORIZED_ACCESS.
  • CREATE TABLE runs an internal write and validation that the session policy denies, so you get AccessDenied.

UC validates just enough to accept the location, but the full external-location and table workflow assumes bucket-style addressing, not access point ARN addressing. This is also why Athena, Snowflake, and EMR work against the same access point. They use the role credentials directly (or are access-point aware) and don't impose UC's path-scoped session policy.

A caution about the access_point field

If you go looking, you'll find an access_point attribute that injects the AP ARN into the session policy and partially improves things. It's what makes top-level listing and file reads succeed. Don't build on it. Per Databricks Support, that field was never released as GA and has been removed from the documentation. The partial success is a side effect of incomplete internal handling, not a supported code path. It won't get you subdirectory listing or table creation.

What I'd do from here

Your source is FSx for NetApp ONTAP exposed through an S3 Access Point, so there's no plain S3 bucket underneath to register directly. With that constraint, here's the path I'd take:

  1. Keep the AWS-native engines for in-place reads. Athena, Snowflake, and EMR are fine wherever you don't need UC governance.
  2. Stage into standard S3, then govern in UC. This is your DataSync workaround, refined. To address the duplication concern, make it incremental instead of a full copy: land data in a standard S3 bucket and use Auto Loader (cloudFiles) to ingest only new files into UC managed or external tables. That restores the full governance layer (lineage, fine-grained ACLs, row and column masking) the access point path can't give you today.
  3. File a feature request with your Databricks account team for native S3 Access Point support in UC credential vending. Attach the repro details you've already collected and track it under a support case. This is a real product gap, not user error.

The bottom line: no IAM tweak will fix this, because the block is in UC's session-policy generation, not your role. Until S3 Access Points are a supported external-location target, standard S3 with Auto Loader into UC tables is the durable, fully governed pattern.

Cheers, Louis.