<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic It's not going well to Connect to Amazon S3 with using Spark in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/it-s-not-going-well-to-connect-to-amazon-s3-with-using-spark/m-p/122194#M46691</link>
    <description>&lt;P&gt;I can't Connect to Amazon S3 well.&lt;BR /&gt;I'm referencing and following this document:&amp;nbsp;&lt;A href="https://docs.databricks.com/gcp/en/connect/storage/amazon-s3" target="_blank"&gt;https://docs.databricks.com/gcp/en/connect/storage/amazon-s3&lt;/A&gt;&lt;/P&gt;&lt;P&gt;But I can't access the S3 well.&lt;/P&gt;&lt;P&gt;I believe the credentials are correct because I have verified that I can access S3 via boto3.&lt;/P&gt;&lt;P&gt;However, I'm using instance profile to access other S3s.&amp;nbsp;&lt;BR /&gt;Could this be the cause?&lt;/P&gt;&lt;P&gt;Thank you.&lt;/P&gt;</description>
    <pubDate>Wed, 18 Jun 2025 23:28:12 GMT</pubDate>
    <dc:creator>Yuki</dc:creator>
    <dc:date>2025-06-18T23:28:12Z</dc:date>
    <item>
      <title>It's not going well to Connect to Amazon S3 with using Spark</title>
      <link>https://community.databricks.com/t5/data-engineering/it-s-not-going-well-to-connect-to-amazon-s3-with-using-spark/m-p/122194#M46691</link>
      <description>&lt;P&gt;I can't Connect to Amazon S3 well.&lt;BR /&gt;I'm referencing and following this document:&amp;nbsp;&lt;A href="https://docs.databricks.com/gcp/en/connect/storage/amazon-s3" target="_blank"&gt;https://docs.databricks.com/gcp/en/connect/storage/amazon-s3&lt;/A&gt;&lt;/P&gt;&lt;P&gt;But I can't access the S3 well.&lt;/P&gt;&lt;P&gt;I believe the credentials are correct because I have verified that I can access S3 via boto3.&lt;/P&gt;&lt;P&gt;However, I'm using instance profile to access other S3s.&amp;nbsp;&lt;BR /&gt;Could this be the cause?&lt;/P&gt;&lt;P&gt;Thank you.&lt;/P&gt;</description>
      <pubDate>Wed, 18 Jun 2025 23:28:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/it-s-not-going-well-to-connect-to-amazon-s3-with-using-spark/m-p/122194#M46691</guid>
      <dc:creator>Yuki</dc:creator>
      <dc:date>2025-06-18T23:28:12Z</dc:date>
    </item>
    <item>
      <title>Re: It's not going well to Connect to Amazon S3 with using Spark</title>
      <link>https://community.databricks.com/t5/data-engineering/it-s-not-going-well-to-connect-to-amazon-s3-with-using-spark/m-p/122462#M46781</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/93088"&gt;@Yuki&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P class=""&gt;If you’re using &lt;SPAN class=""&gt;&lt;STRONG&gt;instance profiles&lt;/STRONG&gt;&lt;/SPAN&gt; to access S3, make sure your &lt;SPAN class=""&gt;&lt;STRONG&gt;cluster is running in “Single User”(or Dedicated) access mode&lt;/STRONG&gt;&lt;/SPAN&gt;. Instance profiles &lt;SPAN class=""&gt;&lt;STRONG&gt;won’t work with Shared(or Standard) or No Isolation clusters&lt;/STRONG&gt;&lt;/SPAN&gt;, especially if you’re trying to access S3 from Unity Catalog or within notebooks.&lt;/P&gt;&lt;P class=""&gt;&amp;nbsp;&lt;/P&gt;&lt;P class=""&gt;You can check this by going to your cluster settings and verifying that it’s configured Access Mode:&amp;nbsp;&lt;SPAN class=""&gt;&lt;STRONG&gt;Single User/Dedicated&lt;/STRONG&gt;&lt;/SPAN&gt;, and that the correct user is assigned (the one mapped to the instance profile, either directly or via group policies).&lt;BR /&gt;&lt;BR /&gt;If this doesn't solve your problem, please post the cluster configuration instance profile json and error &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;&lt;P class=""&gt;Hope this helps &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;BR /&gt;&lt;BR /&gt;Isi&lt;/P&gt;</description>
      <pubDate>Sun, 22 Jun 2025 11:50:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/it-s-not-going-well-to-connect-to-amazon-s3-with-using-spark/m-p/122462#M46781</guid>
      <dc:creator>Isi</dc:creator>
      <dc:date>2025-06-22T11:50:49Z</dc:date>
    </item>
    <item>
      <title>Re: It's not going well to Connect to Amazon S3 with using Spark</title>
      <link>https://community.databricks.com/t5/data-engineering/it-s-not-going-well-to-connect-to-amazon-s3-with-using-spark/m-p/122478#M46788</link>
      <description>&lt;P&gt;Hi Isi,&lt;/P&gt;&lt;P&gt;Thank you for your response — I really appreciate it &lt;span class="lia-unicode-emoji" title=":grinning_face:"&gt;😀&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Apologies, I didn’t explain my concern clearly.&lt;/P&gt;&lt;P&gt;What I’m trying to confirm may be whether the instance profile overrides the spark.conf settings defined in a notebook.&lt;/P&gt;&lt;P&gt;For example, I want to access csv on S3 using the following code:&lt;/P&gt;&lt;P&gt;```python&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;# gloabal lebel&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;spark.conf.&lt;/SPAN&gt;&lt;SPAN&gt;set&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"spark.hadoop.fs.s3a.endpoint"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"s3.amazonaws.com"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;spark.conf.&lt;/SPAN&gt;&lt;SPAN&gt;set&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;'spark.hadoop.fs.s3a.aws.credentials.provider'&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;'org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider'&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;spark.conf.&lt;/SPAN&gt;&lt;SPAN&gt;set&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;'spark.hadoop.fs.s3a.server-side-encryption-algorithm'&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;'SSE-KMS'&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;# Set credentials using Databricks secrets (after SparkSession is created)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;spark.conf.&lt;/SPAN&gt;&lt;SPAN&gt;set&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;'spark.hadoop.fs.s3a.bucket.&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;source_bucket&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;.aws.credentials.provider'&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;'org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider'&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;spark.conf.&lt;/SPAN&gt;&lt;SPAN&gt;set&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;"spark.hadoop.fs.s3a.bucket.&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;source_bucket&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;.endpoint"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"s3.ap-northeast-1.amazonaws.com"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;spark.conf.&lt;/SPAN&gt;&lt;SPAN&gt;set&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;"spark.hadoop.fs.s3a.bucket.&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;source_bucket&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;.access.key"&lt;/SPAN&gt;&lt;SPAN&gt;, source_access_key)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;spark.conf.&lt;/SPAN&gt;&lt;SPAN&gt;set&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;"spark.hadoop.fs.s3a.bucket.&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;source_bucket&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;.secret.key"&lt;/SPAN&gt;&lt;SPAN&gt;, source_secret_key)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;spark.conf.&lt;/SPAN&gt;&lt;SPAN&gt;set&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;"spark.hadoop.fs.s3a.bucket.&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;source_bucket&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;.region"&lt;/SPAN&gt;&lt;SPAN&gt;, source_region)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;df &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; spark.read.&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"header"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;True&lt;/SPAN&gt;&lt;SPAN&gt;).&lt;/SPAN&gt;&lt;SPAN&gt;csv&lt;/SPAN&gt;&lt;SPAN&gt;(source_path)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;```&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;I can access S3 via boto3, but I can't access it...&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;The error message is like below.&lt;BR /&gt;`: java.nio.file.AccessDeniedException: s3a://&amp;lt;source_bucket&amp;gt;/foo.csv: getFileStatus on s3a://&amp;lt;source_bucket&amp;gt;/foo.csv: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden; request: HEAD`&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;I doubt that the issue is caused by the instance profile overwriting credentials. I apologize if my hypothesis caused any misunderstanding of the current status.&lt;/P&gt;&lt;P&gt;Finally, my cluster is Dedicated mode now, thank you for your advice again.&lt;/P&gt;&lt;P&gt;Thank you.&lt;/P&gt;</description>
      <pubDate>Sun, 22 Jun 2025 23:31:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/it-s-not-going-well-to-connect-to-amazon-s3-with-using-spark/m-p/122478#M46788</guid>
      <dc:creator>Yuki</dc:creator>
      <dc:date>2025-06-22T23:31:44Z</dc:date>
    </item>
  </channel>
</rss>

