โ04-16-2025 02:46 PM
Hello All,
I am trying to read a CSV file from my S3 bucket in a notebook running on serverless.
I am using the two standard functions below, but I get a credentials error (Error reading CSV from S3: Unable to locate credentials).
I don't have this issue when running exactly the same code on a personal compute, which has the appropriate AWS access role attached to the compute. Using spark.read.csv() aslo works on serverless, but I would like to be able to use boto3 with serverless.
Is there a way to get this to work?
Thank you!
How can I access
def create_s3_client(key_id, access_key, region๐
return boto3.client(
's3',
aws_access_key_id=key_id,
aws_secret_access_key=access_key,
region_name=region
)
def read_csv_from_s3(client, bucket_name, file_key๐
try:
response = client.get_object(Bucket=bucket_name, Key=file_key)
return pd.read_csv(response['Body'])
except Exception as e:
print(f"Error reading CSV from S3: {e}")
return None
poi_data = read_csv_from_s3(s3_client, aws_bucket_name, poi_location)
โ04-17-2025 01:42 PM
For use cases where you want to use cloud service credentials to authenticate to cloud services, I recommend using Unity Catalog Service Credentials. These work with serverless and class compute in Databricks.
You'd create a service credential, and then refer to it in your code like this:
import boto3
credential = dbutils.credentials.getServiceCredentialsProvider('your-service-credential')
boto3_session = boto3.Session(botocore_session=credential, region_name='your-aws-region')
sm = boto3_session.client('secretsmanager')
sm.get_secret_value...
โ04-18-2025 04:01 AM
Hi @petitregny ,
The issue youโre encountering is likely due to the access mode of your cluster. Serverless compute uses standard/shared access mode, which does not allow you to directly access AWS credentials (such as the instance profile) in the same way as single-user/dedicated access mode.
Thatโs why your code works on a personal compute (with dedicated access mode and instance profile properly attached), but fails on serverless, the credentials are not directly available in the environment.
You can read more in the Databricks documentation:
โBecause serverless compute for workflows uses standard access mode, your workloads must support this access mode.โ
If you really need to use boto3 in this context, you have a few options:
Use Databricks Secrets:
Store your AWS access key and secret in a secret scope and load them in your notebook. This isnโt the cleanest approach, but it avoids complex configuration and works in most cases.
Use Service Credentials with Unity Catalog:
This is a more robust and secure solution, but it does require some architectural setup, including creating a Service Principal, assigning the correct permissions in Unity Catalog, and configuring cross-account IAM roles in AWS. If youโre not familiar with these concepts, it may feel a bit heavy at first.
Stick with spark.read.csv() if possible:
Hope this helps ๐
Isi
โ04-22-2025 12:38 AM
Thank you Isi, I will try with your suggestions.
an hour ago - last edited an hour ago
Any luck on this?
I am also looking for the options on AWS S3 interactions via Boto3 by using Databricks Serverless Notebooks (Compute).
When I tried the new feature (Instance Profiles with Serverless), DBUTIL functions work great on Notebooks, but not the Boto3. We can use Spark read functions, but they are not meant for every operation we perform on S3.
I will definitely try both: creating a Boto3 client using access/secret keys, and then the Service Credentials approach. Before that, I would like to see if these options worked for anybody.
20m ago
Boto3 with Access/Secret Key worked. I will try the Service Credentials. If Databricks Documentation is right, Instance Profiles with Serverless should work to establish Boto3 connection, but, unfortunately, setting up instance profiles on Serverless only works for Databricks native functions like DBUTILS.
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now