<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to create an external location that accesses a public s3 bucket in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-create-an-external-location-that-accesses-a-public-s3/m-p/118462#M45648</link>
    <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/163767"&gt;@deano2025&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P class=""&gt;Using an external location for a public S3 bucket can be unnecessarily&lt;/P&gt;&lt;P class=""&gt;External locations are designed to:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P class=""&gt;Govern access to &lt;SPAN class=""&gt;&lt;STRONG&gt;private cloud storage&lt;/STRONG&gt;&lt;/SPAN&gt; (S3, ADLS, GCS)&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P class=""&gt;Map &lt;SPAN class=""&gt;&lt;STRONG&gt;Unity Catalog permissions&lt;/STRONG&gt;&lt;/SPAN&gt; to cloud-level security via storage credentials&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P class=""&gt;&lt;SPAN class=""&gt;Work with &lt;/SPAN&gt;&lt;STRONG&gt;managed tables, volumes, delta sharing&lt;/STRONG&gt;&lt;SPAN class=""&gt;, etc.&lt;/SPAN&gt;&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P class=""&gt;They rely on a &lt;SPAN class=""&gt;Storage Credential&lt;/SPAN&gt;, which is usually an IAM role or access key &lt;SPAN class=""&gt;that grants access to a private bucket&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;But if you still want to create an External Location you can create a &lt;SPAN class=""&gt;dummy Storage Credential&lt;/SPAN&gt; using a placeholder ARN like:&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;arn:aws:iam::123141241214124:role/role_test&lt;/STRONG&gt;&lt;/P&gt;&lt;P class=""&gt;Then, use that credential when defining your &lt;SPAN class=""&gt;&lt;STRONG&gt;External Location&lt;/STRONG&gt;&lt;/SPAN&gt;.&lt;/P&gt;&lt;P class=""&gt;However, since the bucket is public, you’ll get &lt;SPAN class=""&gt;&lt;STRONG&gt;t&lt;/STRONG&gt;he same result&lt;/SPAN&gt; as simply reading it directly with &lt;SPAN class=""&gt;spark.read...&lt;BR /&gt;&lt;BR /&gt;Hope this helps to clarify, &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;BR /&gt;&lt;BR /&gt;Isi&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;SPAN class=""&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 08 May 2025 12:12:47 GMT</pubDate>
    <dc:creator>Isi</dc:creator>
    <dc:date>2025-05-08T12:12:47Z</dc:date>
    <item>
      <title>How to create an external location that accesses a public s3 bucket</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-create-an-external-location-that-accesses-a-public-s3/m-p/118424#M45638</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I'm trying to&amp;nbsp;create an external location that accesses a public s3 bucket (for open data). However, I'm not having any success. I'm confused to what to specify as the storage credential (IAM role) since its a public bucket that is out of my control. By the way, I can easily select the data directly using pyspark e.g. by calling&amp;nbsp;&lt;SPAN&gt;spark.read.&lt;/SPAN&gt;&lt;SPAN&gt;json, but thought it made sense to use an external location first.&amp;nbsp;&lt;/SPAN&gt;Any ideas on the steps to take? Or is using an external location a waste of time in this case?&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Thu, 08 May 2025 08:45:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-create-an-external-location-that-accesses-a-public-s3/m-p/118424#M45638</guid>
      <dc:creator>deano2025</dc:creator>
      <dc:date>2025-05-08T08:45:16Z</dc:date>
    </item>
    <item>
      <title>Re: How to create an external location that accesses a public s3 bucket</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-create-an-external-location-that-accesses-a-public-s3/m-p/118462#M45648</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/163767"&gt;@deano2025&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P class=""&gt;Using an external location for a public S3 bucket can be unnecessarily&lt;/P&gt;&lt;P class=""&gt;External locations are designed to:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P class=""&gt;Govern access to &lt;SPAN class=""&gt;&lt;STRONG&gt;private cloud storage&lt;/STRONG&gt;&lt;/SPAN&gt; (S3, ADLS, GCS)&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P class=""&gt;Map &lt;SPAN class=""&gt;&lt;STRONG&gt;Unity Catalog permissions&lt;/STRONG&gt;&lt;/SPAN&gt; to cloud-level security via storage credentials&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P class=""&gt;&lt;SPAN class=""&gt;Work with &lt;/SPAN&gt;&lt;STRONG&gt;managed tables, volumes, delta sharing&lt;/STRONG&gt;&lt;SPAN class=""&gt;, etc.&lt;/SPAN&gt;&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P class=""&gt;They rely on a &lt;SPAN class=""&gt;Storage Credential&lt;/SPAN&gt;, which is usually an IAM role or access key &lt;SPAN class=""&gt;that grants access to a private bucket&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;But if you still want to create an External Location you can create a &lt;SPAN class=""&gt;dummy Storage Credential&lt;/SPAN&gt; using a placeholder ARN like:&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;arn:aws:iam::123141241214124:role/role_test&lt;/STRONG&gt;&lt;/P&gt;&lt;P class=""&gt;Then, use that credential when defining your &lt;SPAN class=""&gt;&lt;STRONG&gt;External Location&lt;/STRONG&gt;&lt;/SPAN&gt;.&lt;/P&gt;&lt;P class=""&gt;However, since the bucket is public, you’ll get &lt;SPAN class=""&gt;&lt;STRONG&gt;t&lt;/STRONG&gt;he same result&lt;/SPAN&gt; as simply reading it directly with &lt;SPAN class=""&gt;spark.read...&lt;BR /&gt;&lt;BR /&gt;Hope this helps to clarify, &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;BR /&gt;&lt;BR /&gt;Isi&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;SPAN class=""&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 08 May 2025 12:12:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-create-an-external-location-that-accesses-a-public-s3/m-p/118462#M45648</guid>
      <dc:creator>Isi</dc:creator>
      <dc:date>2025-05-08T12:12:47Z</dc:date>
    </item>
    <item>
      <title>Re: How to create an external location that accesses a public s3 bucket</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-create-an-external-location-that-accesses-a-public-s3/m-p/118666#M45674</link>
      <description>&lt;P&gt;Thanks &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/145555"&gt;@Isi&lt;/a&gt;&amp;nbsp;Now that you've explained external locations, I think it does indeed make sense that they are probably unnecessary in this case. Thanks for clarifying!&lt;/P&gt;</description>
      <pubDate>Fri, 09 May 2025 10:23:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-create-an-external-location-that-accesses-a-public-s3/m-p/118666#M45674</guid>
      <dc:creator>deano2025</dc:creator>
      <dc:date>2025-05-09T10:23:02Z</dc:date>
    </item>
  </channel>
</rss>

