cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to create an external location that accesses a public s3 bucket

deano2025
New Contributor II

Hi,

I'm trying to create an external location that accesses a public s3 bucket (for open data). However, I'm not having any success. I'm confused to what to specify as the storage credential (IAM role) since its a public bucket that is out of my control. By the way, I can easily select the data directly using pyspark e.g. by calling spark.read.json, but thought it made sense to use an external location first. Any ideas on the steps to take? Or is using an external location a waste of time in this case?

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions

Isi
Contributor III

Hey @deano2025 

Using an external location for a public S3 bucket can be unnecessarily

External locations are designed to:

  • Govern access to private cloud storage (S3, ADLS, GCS)

  • Map Unity Catalog permissions to cloud-level security via storage credentials

  • Work with managed tables, volumes, delta sharing, etc.

 

They rely on a Storage Credential, which is usually an IAM role or access key that grants access to a private bucket

But if you still want to create an External Location you can create a dummy Storage Credential using a placeholder ARN like:

arn:aws:iam::123141241214124:role/role_test

Then, use that credential when defining your External Location.

However, since the bucket is public, youโ€™ll get the same result as simply reading it directly with spark.read...

Hope this helps to clarify, ๐Ÿ™‚

Isi

 

View solution in original post

2 REPLIES 2

Isi
Contributor III

Hey @deano2025 

Using an external location for a public S3 bucket can be unnecessarily

External locations are designed to:

  • Govern access to private cloud storage (S3, ADLS, GCS)

  • Map Unity Catalog permissions to cloud-level security via storage credentials

  • Work with managed tables, volumes, delta sharing, etc.

 

They rely on a Storage Credential, which is usually an IAM role or access key that grants access to a private bucket

But if you still want to create an External Location you can create a dummy Storage Credential using a placeholder ARN like:

arn:aws:iam::123141241214124:role/role_test

Then, use that credential when defining your External Location.

However, since the bucket is public, youโ€™ll get the same result as simply reading it directly with spark.read...

Hope this helps to clarify, ๐Ÿ™‚

Isi

 

deano2025
New Contributor II

Thanks @Isi Now that you've explained external locations, I think it does indeed make sense that they are probably unnecessary in this case. Thanks for clarifying!

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now