Autoloader: Cross-account bucket Assume role access denied

deng_dev
New Contributor III

 

Hi everyone!

I have a Databricks instance profile role that has permission to assume a role in another AWS account to access an S3 bucket in that account.

When I try to assume the role using boto3, it correctly reads the Databricks AWS credentials, assumes the role, and is able to read the S3 file without any errors.

However, when I try to use this role in a cloudFiles stream, it fails with an
AccessDenied error.java.nio.file.AccessDeniedException: <bucket> getFileStatus on <bucket> AmazonS3Exception: Forbidden; request: HEAD <bucket> customer-info {} Hadoop 3.3.6, 403 Forbidden

Here is sample code I am using:

options_dict = {
    "cloudFiles.roleArn": role_arn,
    "cloudFiles.format": "json",
    "cloudFiles.schemaLocation": <schema_path>,
    "cloudFiles.includeExistingFiles": "true",
    "multiLine": "true"
}
df = (spark.readStream
              .format("cloudFiles")
              .options(**options_dict)
              .load("<bucket>")
              )

 

: