Hi @petitregny ,
The issue you’re encountering is likely due to the access mode of your cluster. Serverless compute uses standard/shared access mode, which does not allow you to directly access AWS credentials (such as the instance profile) in the same way as single-user/dedicated access mode.
That’s why your code works on a personal compute (with dedicated access mode and instance profile properly attached), but fails on serverless, the credentials are not directly available in the environment.
You can read more in the Databricks documentation:
“Because serverless compute for workflows uses standard access mode, your workloads must support this access mode.”
If you really need to use boto3 in this context, you have a few options:
Use Databricks Secrets:
Store your AWS access key and secret in a secret scope and load them in your notebook. This isn’t the cleanest approach, but it avoids complex configuration and works in most cases.
Use Service Credentials with Unity Catalog:
This is a more robust and secure solution, but it does require some architectural setup, including creating a Service Principal, assigning the correct permissions in Unity Catalog, and configuring cross-account IAM roles in AWS. If you’re not familiar with these concepts, it may feel a bit heavy at first.
Stick with spark.read.csv() if possible:
Since it works under the hood with Databricks’ credentials delegation and accesses S3 through an External Location, it’s the most compatible and secure way to read data from S3 in serverless environments.
Hope this helps 🙂
Isi