Databricks Community

basit_siddiqui · ‎08-16-2024

I have seen numerous post by you. Thanks for continuously providing support. Can you or your colleagues help on this.

We have a basic user which assumes a role with S3 policy to a specific bucket. When we try to read the bucket from Databricks python notebook using boto3 all works fine.

As soon as we use autoloader it fails with an exception

Common Code

# AWS credentials

aws_access_key_id = ""

aws_secret_access_key = ""

role_arn = "arn:aws:iam::XXXXXXX:role/roleName"

mfa_serial_number = "XXXX"

mfa_code = input("Enter MFA code: ") # Prompt user for the MFA code

# Create a Boto3 STS client with the provided credentials

sts_client = boto3.client(

'sts',

aws_access_key_id=aws_access_key_id,

aws_secret_access_key=aws_secret_access_key

)

# Assume Role with Boto3 including MFA

assumed_role = sts_client.assume_role(

RoleArn=role_arn,

RoleSessionName='session-name',

SerialNumber=mfa_serial_number,

TokenCode=mfa_code

)

credentials = assumed_role['Credentials']

boto3 is working - Here is the code for spark

import boto3

import os

from pyspark.sql import SparkSession

# Create a new Spark session

spark = SparkSession.builder \

.appName("S3AssumeRoleSession") \

.config("spark.hadoop.fs.s3a.assumed.role.arn", role_arn) \

.config("spark.hadoop.fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider") \

.config("spark.hadoop.fs.s3a.access.key", credentials['AccessKeyId']) \

.config("spark.hadoop.fs.s3a.secret.key", credentials['SecretAccessKey']) \

.config("spark.hadoop.fs.s3a.session.token", credentials['SessionToken']) \

.getOrCreate()

# Use Spark Session to read a JSON file from S3

path = "s3a://abc/xyz.json"

df = spark.read.json(path)

display(df)

Exception

basit_siddiqui · ‎08-16-2024

Py4JJavaError: An error occurred while calling o503.json. : java.nio.file.AccessDeniedException: s3a://xxxxxx.json: shaded.databricks.org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by AwsCredentialContextTokenProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from any provider in the chain: [com.databricks.backend.daemon.driver.aws.AwsLocalCredentialContextTokenProvider@12d04309: No role specified and no roles available., com.databricks.backend.daemon.driver.aws.ProxiedIAMCredentialProvider@5adfc17a: User does not have any IAM roles] at shaded.databricks.org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:249) at shaded.databricks.org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:197) at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:4141) at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:4067) at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:3947) at com.databricks.common.filesystem.LokiS3FS.getFileStatusNoCache(LokiS3FS.scala:84) at com.databricks.common.filesystem.LokiS3FS.getFileStatus(LokiS3FS.scala:74) at com.databricks.common.filesystem.LokiFileSystem.getFileStatus(LokiFileSystem.scala:272) at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1880) at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:60) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:416) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:389) at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:345) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:345) at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:523) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397) at py4j.Gateway.invoke(Gateway.java:306) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199) at py4j.ClientServerConnection.run(ClientServerConnection.java:119) at java.lang.Thread.run(Thread.java:750) Caused by: shaded.databricks.org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by AwsCredentialContextTokenProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from any provider in the chain: [com.databricks.backend.daemon.driver.aws.AwsLocalCredentialContextTokenProvider@12d04309: No role specified and no roles available., com.databricks.backend.daemon.driver.aws.ProxiedIAMCredentialProvider@5adfc17a: User does not have any IAM roles] a

Databricks Community

Autloader error for assuming a role

Join Us as a Local Community Builder!

Announcing Backfill Runs in Lakeflow Jobs for Higher Quality Downstream Data

🚀 New: Databricks Interactive Architecture Design Workshops

Introducing Community Pulse — Your Weekly Databricks Roundup!

Solution Accelerator Series | #5 - Automating Product Review Summarization with LLMs

Databricks DevConnect I Washington D.C.