cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Unity Catalog External Location with Amazon S3 Access Points,session policy behavior and workarounds

YoshikiFujiwara
New Contributor II

Context

I'm working on integration patterns between enterprise NAS storage (Amazon FSx for NetApp ONTAP) and Databricks via S3 Access Points. S3 Access Points provide S3 API access to file data without copying โ€” a common pattern for organizations with existing NFS/SMB workloads.

I've documented my findings publicly here: https://github.com/Yoshiki0705/fsxn-lakehouse-integrations

What I've observed

When registering an S3 Access Point as a UC External Location:

  • Yes: External Location creation succeeds
  • Yes: Top-level file listing works
  • Yes: Explicit file reads (specifying full path) work
  • No: Subdirectory listing fails with UC_CLOUD_STORAGE_ACCESS_FAILURE
  • No: CREATE TABLE fails with AccessDenied

The behavior suggests the session policy generated during AssumeRole doesn't correctly handle S3 AP ARN format (arn:aws:s3:REGION:ACCOUNT:accesspoint/NAME).

Technical details

  • Databricks on AWS, Premium tier
  • Unity Catalog enabled
  • Storage Credential: IAM Role with full s3:* on the AP ARN
  • S3 AP type: Internet-origin (same as what works with Athena/Snowflake/EMR)
  • The same data is queryable via Athena, Snowflake External Stage, and EMR Serverless without issues

Questions for the community

  1. Has anyone successfully registered an S3 Access Point (not a standard S3 bucket) as a UC External Location? If so, what configuration was needed?

  2. Is there a documented limitation or roadmap item for S3 AP support in UC? I couldn't find this in the current documentation.

  3. For those using the Storage Ecosystem partners (announced at DAIS 2026) โ€” does the native integration bypass this limitation by using a different registration path?

Current workaround

I'm currently using DataSync โ†’ standard S3 bucket โ†’ UC External Location, which works but introduces data copy. For read-only analytics, Athena and Snowflake can query the S3 AP directly, so this is specifically a UC limitation.

Why this matters to the community

Many organizations store data on enterprise NAS (NFS/SMB) and want to use Databricks for ML/AI without duplicating everything to S3. S3 Access Points are designed exactly for this "access without copy" pattern. If UC could support S3 AP ARNs, it would enable zero-copy governed analytics on enterprise file storage โ€” benefiting anyone with NAS-resident data.

Environment: Databricks on AWS, ap-northeast-1, Premium tier, DBR 16.1+

0 REPLIES 0