cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Autloader error for assuming a role

basit_siddiqui
New Contributor III

Hi @Retired_mod 

I have seen numerous post by you. Thanks for continuously providing support. Can you or your colleagues help on this. 

We have a basic user which assumes a role with S3 policy to a specific bucket. When we try to read the bucket from Databricks python notebook using boto3 all works fine.

As soon as we use autoloader it fails with an exception 

Common Code

# AWS credentials
aws_access_key_id = ""
aws_secret_access_key = ""
role_arn = "arn:aws:iam::XXXXXXX:role/roleName"
mfa_serial_number = "XXXX" 

mfa_code = input("Enter MFA code: ")  # Prompt user for the MFA code

# Create a Boto3 STS client with the provided credentials
sts_client = boto3.client(
    'sts',
    aws_access_key_id=aws_access_key_id,
    aws_secret_access_key=aws_secret_access_key
)

# Assume Role with Boto3 including MFA
assumed_role = sts_client.assume_role(
    RoleArn=role_arn,
    RoleSessionName='session-name',
    SerialNumber=mfa_serial_number,
    TokenCode=mfa_code
)

credentials = assumed_role['Credentials']
 
boto3 is working - Here is the code for spark
 
import boto3
import os
from pyspark.sql import SparkSession
 
# Create a new Spark session
spark = SparkSession.builder \
    .appName("S3AssumeRoleSession") \
    .config("spark.hadoop.fs.s3a.assumed.role.arn", role_arn) \
    .config("spark.hadoop.fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider") \
    .config("spark.hadoop.fs.s3a.access.key", credentials['AccessKeyId']) \
    .config("spark.hadoop.fs.s3a.secret.key", credentials['SecretAccessKey']) \
    .config("spark.hadoop.fs.s3a.session.token", credentials['SessionToken']) \
    .getOrCreate()
   
# Use Spark Session to read a JSON file from S3
path = "s3a://abc/xyz.json"
df = spark.read.json(path)
display(df)
 
Exception 
 
1 REPLY 1

basit_siddiqui
New Contributor III
Py4JJavaError: An error occurred while calling o503.json. : java.nio.file.AccessDeniedException: s3a://xxxxxx.json: shaded.databricks.org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by AwsCredentialContextTokenProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from any provider in the chain: [com.databricks.backend.daemon.driver.aws.AwsLocalCredentialContextTokenProvider@12d04309: No role specified and no roles available., com.databricks.backend.daemon.driver.aws.ProxiedIAMCredentialProvider@5adfc17a: User does not have any IAM roles] at shaded.databricks.org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:249) at shaded.databricks.org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:197) at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:4141) at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:4067) at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:3947) at com.databricks.common.filesystem.LokiS3FS.getFileStatusNoCache(LokiS3FS.scala:84) at com.databricks.common.filesystem.LokiS3FS.getFileStatus(LokiS3FS.scala:74) at com.databricks.common.filesystem.LokiFileSystem.getFileStatus(LokiFileSystem.scala:272) at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1880) at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:60) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:416) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:389) at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:345) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:345) at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:523) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397) at py4j.Gateway.invoke(Gateway.java:306) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199) at py4j.ClientServerConnection.run(ClientServerConnection.java:119) at java.lang.Thread.run(Thread.java:750) Caused by: shaded.databricks.org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by AwsCredentialContextTokenProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from any provider in the chain: [com.databricks.backend.daemon.driver.aws.AwsLocalCredentialContextTokenProvider@12d04309: No role specified and no roles available., com.databricks.backend.daemon.driver.aws.ProxiedIAMCredentialProvider@5adfc17a: User does not have any IAM roles] a

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now