Hi everyone!
I have a Databricks instance profile role that has permission to assume a role in another AWS account to access an S3 bucket in that account.
When I try to assume the role using boto3, it correctly reads the Databricks AWS credentials, assumes the role, and is able to read the S3 file without any errors.
However, when I try to use this role in a cloudFiles stream, it fails with an
AccessDenied error.java.nio.file.AccessDeniedException: <bucket> getFileStatus on <bucket> AmazonS3Exception: Forbidden; request: HEAD <bucket> customer-info {} Hadoop 3.3.6, 403 Forbidden
Here is sample code I am using:
options_dict = {
"cloudFiles.roleArn": role_arn,
"cloudFiles.format": "json",
"cloudFiles.schemaLocation": <schema_path>,
"cloudFiles.includeExistingFiles": "true",
"multiLine": "true"
}
df = (spark.readStream
.format("cloudFiles")
.options(**options_dict)
.load("<bucket>")
)
: