topic Re: Reading from an S3 bucket using boto3 on serverless cluster in Data Engineering

Reading from an S3 bucket using boto3 on serverless cluster

petitregny — Wed, 16 Apr 2025 21:46:56 GMT

Hello All,

I am trying to read a CSV file from my S3 bucket in a notebook running on serverless.

I am using the two standard functions below, but I get a credentials error (Error reading CSV from S3: Unable to locate credentials).

I don't have this issue when running exactly the same code on a personal compute, which has the appropriate AWS access role attached to the compute. Using spark.read.csv() aslo works on serverless, but I would like to be able to use boto3 with serverless.

Is there a way to get this to work?

Thank you!

How can I access

def create_s3_client(key_id, access_key, region😞

return boto3.client(

's3',

aws_access_key_id=key_id,

aws_secret_access_key=access_key,

region_name=region

 )

def read_csv_from_s3(client, bucket_name, file_key😞

try:

 response = client.get_object(Bucket=bucket_name, Key=file_key)

return pd.read_csv(response['Body'])

except Exception as e:

print(f"Error reading CSV from S3: {e}")

return None

poi_data = read_csv_from_s3(s3_client, aws_bucket_name, poi_location)

Re: Reading from an S3 bucket using boto3 on serverless cluster

cgrant — Thu, 17 Apr 2025 20:42:24 GMT

For use cases where you want to use cloud service credentials to authenticate to cloud services, I recommend using Unity Catalog Service Credentials. These work with serverless and class compute in Databricks.

You'd create a service credential, and then refer to it in your code like this:

import boto3 credential = dbutils.credentials.getServiceCredentialsProvider('your-service-credential') boto3_session = boto3.Session(botocore_session=credential, region_name='your-aws-region') sm = boto3_session.client('secretsmanager') sm.get_secret_value...

Re: Reading from an S3 bucket using boto3 on serverless cluster

Isi — Fri, 18 Apr 2025 11:01:36 GMT

Hi @petitregny ,

The issue you’re encountering is likely due to the access mode of your cluster. Serverless compute uses standard/shared access mode, which does not allow you to directly access AWS credentials (such as the instance profile) in the same way as single-user/dedicated access mode.

That’s why your code works on a personal compute (with dedicated access mode and instance profile properly attached), but fails on serverless, the credentials are not directly available in the environment.

You can read more in the Databricks documentation:

“Because serverless compute for workflows uses standard access mode, your workloads must support this access mode.”

If you really need to use boto3 in this context, you have a few options:

Use Databricks Secrets:
Store your AWS access key and secret in a secret scope and load them in your notebook. This isn’t the cleanest approach, but it avoids complex configuration and works in most cases.
Use Service Credentials with Unity Catalog:
This is a more robust and secure solution, but it does require some architectural setup, including creating a Service Principal, assigning the correct permissions in Unity Catalog, and configuring cross-account IAM roles in AWS. If you’re not familiar with these concepts, it may feel a bit heavy at first.
Stick with spark.read.csv() if possible:

Since it works under the hood with Databricks’ credentials delegation and accesses S3 through an External Location, it’s the most compatible and secure way to read data from S3 in serverless environments.

Hope this helps 🙂

Isi

Re: Reading from an S3 bucket using boto3 on serverless cluster

petitregny — Tue, 22 Apr 2025 07:38:20 GMT

Thank you Isi, I will try with your suggestions.

Re: Reading from an S3 bucket using boto3 on serverless cluster

Ramana — Tue, 11 Nov 2025 21:34:20 GMT

Any luck on this?

I am also looking for the options on AWS S3 interactions via Boto3 by using Databricks Serverless Notebooks (Compute).

When I tried the new feature (Instance Profiles with Serverless), DBUTIL functions work great on Notebooks, but not the Boto3. We can use Spark read functions, but they are not meant for every operation we perform on S3.

I will definitely try both: creating a Boto3 client using access/secret keys, and then the Service Credentials approach. Before that, I would like to see if these options worked for anybody.

Re: Reading from an S3 bucket using boto3 on serverless cluster

Ramana — Tue, 11 Nov 2025 22:16:47 GMT

Boto3 with Access/Secret Key worked. I will try the Service Credentials. If Databricks Documentation is right, Instance Profiles with Serverless should work to establish Boto3 connection, but, unfortunately, setting up instance profiles on Serverless only works for Databricks native functions like DBUTILS.