cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Autloader error for assuming a role

AbdulBasit
New Contributor II

Hi @Retired_mod 

I have seen numerous post by you. Thanks for continuously providing support. Can you or your colleagues help on this. 

We have a basic user which assumes a role with S3 policy to a specific bucket. When we try to read the bucket from Databricks python notebook using boto3 all works fine.

As soon as we use autoloader it fails with an exception 

Common Code

# AWS credentials
aws_access_key_id = ""
aws_secret_access_key = ""
role_arn = "arn:aws:iam::XXXXXXX:role/roleName"
mfa_serial_number = "XXXX" 

mfa_code = input("Enter MFA code: ")  # Prompt user for the MFA code

# Create a Boto3 STS client with the provided credentials
sts_client = boto3.client(
    'sts',
    aws_access_key_id=aws_access_key_id,
    aws_secret_access_key=aws_secret_access_key
)

# Assume Role with Boto3 including MFA
assumed_role = sts_client.assume_role(
    RoleArn=role_arn,
    RoleSessionName='session-name',
    SerialNumber=mfa_serial_number,
    TokenCode=mfa_code
)

credentials = assumed_role['Credentials']
 
boto3 is working - Here is the code for spark
 
import boto3
import os
from pyspark.sql import SparkSession
 
# Create a new Spark session
spark = SparkSession.builder \
    .appName("S3AssumeRoleSession") \
    .config("spark.hadoop.fs.s3a.assumed.role.arn", role_arn) \
    .config("spark.hadoop.fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider") \
    .config("spark.hadoop.fs.s3a.access.key", credentials['AccessKeyId']) \
    .config("spark.hadoop.fs.s3a.secret.key", credentials['SecretAccessKey']) \
    .config("spark.hadoop.fs.s3a.session.token", credentials['SessionToken']) \
    .getOrCreate()
   
# Use Spark Session to read a JSON file from S3
path = "s3a://abc/xyz.json"
df = spark.read.json(path)
display(df)
 
Exception 
 
1 REPLY 1

AbdulBasit
New Contributor II
Py4JJavaError: An error occurred while calling o503.json. : java.nio.file.AccessDeniedException: s3a://xxxxxx.json: shaded.databricks.org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by AwsCredentialContextTokenProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from any provider in the chain: [com.databricks.backend.daemon.driver.aws.AwsLocalCredentialContextTokenProvider@12d04309: No role specified and no roles available., com.databricks.backend.daemon.driver.aws.ProxiedIAMCredentialProvider@5adfc17a: User does not have any IAM roles] at shaded.databricks.org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:249) at shaded.databricks.org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:197) at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:4141) at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:4067) at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:3947) at com.databricks.common.filesystem.LokiS3FS.getFileStatusNoCache(LokiS3FS.scala:84) at com.databricks.common.filesystem.LokiS3FS.getFileStatus(LokiS3FS.scala:74) at com.databricks.common.filesystem.LokiFileSystem.getFileStatus(LokiFileSystem.scala:272) at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1880) at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:60) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:416) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:389) at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:345) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:345) at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:523) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397) at py4j.Gateway.invoke(Gateway.java:306) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199) at py4j.ClientServerConnection.run(ClientServerConnection.java:119) at java.lang.Thread.run(Thread.java:750) Caused by: shaded.databricks.org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by AwsCredentialContextTokenProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from any provider in the chain: [com.databricks.backend.daemon.driver.aws.AwsLocalCredentialContextTokenProvider@12d04309: No role specified and no roles available., com.databricks.backend.daemon.driver.aws.ProxiedIAMCredentialProvider@5adfc17a: User does not have any IAM roles] a

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group