<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Reading from an S3 bucket using boto3 on serverless cluster in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/reading-from-an-s3-bucket-using-boto3-on-serverless-cluster/m-p/115805#M45186</link>
    <description>&lt;P&gt;For use cases where you want to use cloud service credentials to authenticate to cloud services, I recommend using Unity Catalog Service Credentials. These work with serverless and class compute in Databricks.&lt;/P&gt;
&lt;P&gt;You'd create a service credential, and then refer to it in your code like this:&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;import boto3
credential = dbutils.credentials.getServiceCredentialsProvider('your-service-credential')
boto3_session = boto3.Session(botocore_session=credential, region_name='your-aws-region')
sm = boto3_session.client('secretsmanager')
sm.get_secret_value...&lt;/LI-CODE&gt;</description>
    <pubDate>Thu, 17 Apr 2025 20:42:24 GMT</pubDate>
    <dc:creator>cgrant</dc:creator>
    <dc:date>2025-04-17T20:42:24Z</dc:date>
    <item>
      <title>Reading from an S3 bucket using boto3 on serverless cluster</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-from-an-s3-bucket-using-boto3-on-serverless-cluster/m-p/115692#M45158</link>
      <description>&lt;P&gt;Hello All,&lt;BR /&gt;&lt;BR /&gt;I am trying to read a CSV file from my S3 bucket in a notebook running on serverless.&lt;BR /&gt;&lt;BR /&gt;I am using the two standard functions below, but I get a credentials error (&lt;U&gt;Error reading CSV from S3: Unable to locate credentials&lt;/U&gt;).&lt;BR /&gt;&lt;BR /&gt;I don't have this issue when running exactly the same code on a personal compute, which has the appropriate AWS access role attached to the compute. Using spark.read.csv() aslo works on serverless, but I would like to be able to use boto3 with serverless.&lt;BR /&gt;&lt;BR /&gt;Is there a way to get this to work?&lt;BR /&gt;&lt;BR /&gt;Thank you!&lt;BR /&gt;&lt;BR /&gt;How can I access&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV&gt;&lt;PRE&gt;&lt;SPAN&gt;def&lt;/SPAN&gt; &lt;SPAN&gt;create_s3_client&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;key_id&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;access_key&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;region&lt;/SPAN&gt;&lt;SPAN&gt;&lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;return&lt;/SPAN&gt;&lt;SPAN&gt; boto3.&lt;/SPAN&gt;&lt;SPAN&gt;client&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;'s3'&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;aws_access_key_id&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;key_id,&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;aws_secret_access_key&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;access_key,&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;region_name&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;region&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt; )&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;def&lt;/SPAN&gt; &lt;SPAN&gt;read_csv_from_s3&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;client&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;bucket_name&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;file_key&lt;/SPAN&gt;&lt;SPAN&gt;&lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;try&lt;/SPAN&gt;&lt;SPAN&gt;:&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt; response &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; client.&lt;/SPAN&gt;&lt;SPAN&gt;get_object&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;Bucket&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;bucket_name, &lt;/SPAN&gt;&lt;SPAN&gt;Key&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;file_key)&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;return&lt;/SPAN&gt;&lt;SPAN&gt; pd.&lt;/SPAN&gt;&lt;SPAN&gt;read_csv&lt;/SPAN&gt;&lt;SPAN&gt;(response[&lt;/SPAN&gt;&lt;SPAN&gt;'Body'&lt;/SPAN&gt;&lt;SPAN&gt;])&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;except&lt;/SPAN&gt; &lt;SPAN&gt;Exception&lt;/SPAN&gt; &lt;SPAN&gt;as&lt;/SPAN&gt;&lt;SPAN&gt; e:&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;print&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;"Error reading CSV from S3: &lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;e&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;return&lt;/SPAN&gt; &lt;SPAN&gt;None&lt;BR /&gt;&lt;BR /&gt;poi_data = read_csv_from_s3(s3_client, aws_bucket_name, poi_location)&lt;BR /&gt;&lt;/SPAN&gt;&lt;/PRE&gt;&lt;DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 16 Apr 2025 21:46:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-from-an-s3-bucket-using-boto3-on-serverless-cluster/m-p/115692#M45158</guid>
      <dc:creator>petitregny</dc:creator>
      <dc:date>2025-04-16T21:46:56Z</dc:date>
    </item>
    <item>
      <title>Re: Reading from an S3 bucket using boto3 on serverless cluster</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-from-an-s3-bucket-using-boto3-on-serverless-cluster/m-p/115805#M45186</link>
      <description>&lt;P&gt;For use cases where you want to use cloud service credentials to authenticate to cloud services, I recommend using Unity Catalog Service Credentials. These work with serverless and class compute in Databricks.&lt;/P&gt;
&lt;P&gt;You'd create a service credential, and then refer to it in your code like this:&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;import boto3
credential = dbutils.credentials.getServiceCredentialsProvider('your-service-credential')
boto3_session = boto3.Session(botocore_session=credential, region_name='your-aws-region')
sm = boto3_session.client('secretsmanager')
sm.get_secret_value...&lt;/LI-CODE&gt;</description>
      <pubDate>Thu, 17 Apr 2025 20:42:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-from-an-s3-bucket-using-boto3-on-serverless-cluster/m-p/115805#M45186</guid>
      <dc:creator>cgrant</dc:creator>
      <dc:date>2025-04-17T20:42:24Z</dc:date>
    </item>
    <item>
      <title>Re: Reading from an S3 bucket using boto3 on serverless cluster</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-from-an-s3-bucket-using-boto3-on-serverless-cluster/m-p/115838#M45198</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/159718"&gt;@petitregny&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P class=""&gt;The issue you’re encountering is likely due to the access mode of your cluster. Serverless compute uses &lt;SPAN class=""&gt;&lt;STRONG&gt;standard/shared access mode&lt;/STRONG&gt;&lt;/SPAN&gt;, which &lt;SPAN class=""&gt;&lt;STRONG&gt;does not allow you to directly access AWS credentials&lt;/STRONG&gt;&lt;/SPAN&gt; (such as the instance profile) in the same way as &lt;SPAN class=""&gt;&lt;STRONG&gt;single-user/dedicated access mode&lt;/STRONG&gt;&lt;/SPAN&gt;.&lt;/P&gt;&lt;P class=""&gt;That’s why your code works on a personal compute (with dedicated access mode and instance profile properly attached), but fails on serverless, the credentials are not directly available in the environment.&lt;/P&gt;&lt;P class=""&gt;You can read more in the &lt;A href="https://docs.databricks.com/en/security/access-control/iam/iam-roles.html" target="_blank" rel="noopener"&gt;Databricks documentation&lt;/A&gt;:&lt;BR /&gt;&lt;BR /&gt;&lt;EM&gt;“Because serverless compute for workflows uses standard access mode, your workloads must support this access mode.”&lt;/EM&gt;&lt;/P&gt;&lt;P class=""&gt;If you really need to use &lt;SPAN class=""&gt;boto3&lt;/SPAN&gt; in this context, you have a few options:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;&lt;P class=""&gt;&lt;STRONG&gt;Use Databricks Secrets&lt;/STRONG&gt;&lt;SPAN class=""&gt;:&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;Store your AWS access key and secret in a secret scope and load them in your notebook. This isn’t the cleanest approach, but it avoids complex configuration and works in most cases.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P class=""&gt;&lt;STRONG&gt;Use Service Credentials with Unity Catalog&lt;/STRONG&gt;&lt;SPAN class=""&gt;:&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;This is a more robust and secure solution, but it does require some architectural setup, including creating a Service Principal, assigning the correct permissions in Unity Catalog, and configuring cross-account IAM roles in AWS. If you’re not familiar with these concepts, it may feel a bit heavy at first.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P class=""&gt;&lt;STRONG&gt;Stick with spark.read.csv()&lt;/STRONG&gt;&lt;SPAN class=""&gt; if possible:&lt;/SPAN&gt;&lt;/P&gt;&lt;P class=""&gt;&amp;nbsp;&lt;/P&gt;&lt;BR /&gt;Since it works under the hood with Databricks’ credentials delegation and accesses S3 through an &lt;SPAN class=""&gt;&lt;STRONG&gt;External Location&lt;/STRONG&gt;&lt;/SPAN&gt;, it’s the most compatible and secure way to read data from S3 in serverless environments.&lt;/LI&gt;&lt;/OL&gt;&lt;P class=""&gt;Hope this helps &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;BR /&gt;&lt;BR /&gt;Isi&lt;/P&gt;</description>
      <pubDate>Fri, 18 Apr 2025 11:01:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-from-an-s3-bucket-using-boto3-on-serverless-cluster/m-p/115838#M45198</guid>
      <dc:creator>Isi</dc:creator>
      <dc:date>2025-04-18T11:01:36Z</dc:date>
    </item>
    <item>
      <title>Re: Reading from an S3 bucket using boto3 on serverless cluster</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-from-an-s3-bucket-using-boto3-on-serverless-cluster/m-p/116141#M45251</link>
      <description>&lt;P&gt;Thank you Isi, I will try with your suggestions.&lt;/P&gt;</description>
      <pubDate>Tue, 22 Apr 2025 07:38:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-from-an-s3-bucket-using-boto3-on-serverless-cluster/m-p/116141#M45251</guid>
      <dc:creator>petitregny</dc:creator>
      <dc:date>2025-04-22T07:38:20Z</dc:date>
    </item>
    <item>
      <title>Re: Reading from an S3 bucket using boto3 on serverless cluster</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-from-an-s3-bucket-using-boto3-on-serverless-cluster/m-p/138657#M50990</link>
      <description>&lt;P&gt;Any luck on this?&lt;/P&gt;&lt;P&gt;I am also looking for the options on AWS S3 interactions via Boto3 by using Databricks Serverless Notebooks (Compute).&lt;/P&gt;&lt;P&gt;When I tried the new feature (&lt;A href="https://www.databricks.com/blog/introducing-serverless-support-aws-instance-profiles" target="_self"&gt;Instance Profiles with Serverless&lt;/A&gt;), DBUTIL functions work great on Notebooks, but not the Boto3. We can use Spark read functions, but they are not meant for every operation we perform on S3.&lt;/P&gt;&lt;P&gt;I will definitely try both: creating a Boto3 client using access/secret keys, and then the &lt;A href="https://docs.databricks.com/aws/en/connect/unity-catalog/cloud-services/service-credentials" target="_self"&gt;Service Credentials&lt;/A&gt; approach. Before that, I would like to see if these options worked for anybody.&lt;/P&gt;</description>
      <pubDate>Tue, 11 Nov 2025 21:34:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-from-an-s3-bucket-using-boto3-on-serverless-cluster/m-p/138657#M50990</guid>
      <dc:creator>Ramana</dc:creator>
      <dc:date>2025-11-11T21:34:20Z</dc:date>
    </item>
    <item>
      <title>Re: Reading from an S3 bucket using boto3 on serverless cluster</title>
      <link>https://community.databricks.com/t5/data-engineering/reading-from-an-s3-bucket-using-boto3-on-serverless-cluster/m-p/138662#M50991</link>
      <description>&lt;P&gt;Boto3 with Access/Secret Key worked. I will try the Service Credentials.&amp;nbsp;&lt;SPAN&gt;&amp;nbsp;If Databricks Documentation is right,&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://www.databricks.com/blog/introducing-serverless-support-aws-instance-profiles" target="_self" rel="nofollow noopener noreferrer"&gt;Instance Profiles with Serverless&lt;/A&gt;&amp;nbsp;should work to establish Boto3 connection&lt;SPAN&gt;, but, unfortunately, setting up instance profiles on Serverless only works for Databricks native functions like DBUTILS.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 11 Nov 2025 22:16:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/reading-from-an-s3-bucket-using-boto3-on-serverless-cluster/m-p/138662#M50991</guid>
      <dc:creator>Ramana</dc:creator>
      <dc:date>2025-11-11T22:16:47Z</dc:date>
    </item>
  </channel>
</rss>

