cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Unable to Read Data from S3 in Databricks (AWS Free Trial)

messiah
New Contributor II

Hey Community,

I recently signed up for a Databricks free trial on AWS and created a workspace using the quickstart method. After setting up my cluster and opening a notebook, I tried to read a Parquet file from S3 using:

 

spark.read.parquet("s3://<bucket-name>/path/")

 

However, Iā€™m getting the following error:

 

Py4JJavaError: An error occurred while calling o408.parquet.
: java.nio.file.AccessDeniedException: s3://databricks-workspace-stack-d3546-bucket/parquet-samples: shaded.databricks.org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by AwsCredentialContextTokenProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from any provider in the chain: [com.databricks.backend.daemon.driver.aws.AwsLocalCredentialContextTokenProvider@fd37933: No role specified and no roles available., com.databricks.backend.daemon.driver.aws.ProxiedIAMCredentialProvider@6d3127a1: User does not have any IAM roles]
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:249)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:197)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:4141)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:4067)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:3947)
	at com.databricks.common.filesystem.LokiS3FS.getFileStatusNoCache(LokiS3FS.scala:84)
	at com.databricks.common.filesystem.LokiS3FS.getFileStatus(LokiS3FS.scala:74)
	at com.databricks.common.filesystem.LokiFileSystem.getFileStatus(LokiFileSystem.scala:272)
	at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1880)
	at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:60)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:416)
	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:389)
	at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:345)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:345)
	at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:866)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
	at py4j.Gateway.invoke(Gateway.java:306)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:119)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: shaded.databricks.org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by AwsCredentialContextTokenProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from any provider in the chain: [com.databricks.backend.daemon.driver.aws.AwsLocalCredentialContextTokenProvider@fd37933: No role specified and no roles available., com.databricks.backend.daemon.driver.aws.ProxiedIAMCredentialProvider@6d3127a1: User does not have any IAM roles]
	at shaded.databricks.org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:239)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1269)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers(AmazonHttpClient.java:845)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:794)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:781)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:755)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:715)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:697)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:561)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:541)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5456)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5403)
	at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1372)
	at shaded.databricks.org.apache.hadoop.fs.s3a.EnforcingDatabricksS3Client.getObjectMetadata(EnforcingDatabricksS3Client.scala:222)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$getObjectMetadata$6(S3AFileSystem.java:2364)
	at shaded.databricks.org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:435)
	at shaded.databricks.org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:394)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:2354)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:2322)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:4122)
	... 25 more
Caused by: com.amazonaws.SdkClientException: Unable to load AWS credentials from any provider in the chain: [com.databricks.backend.daemon.driver.aws.AwsLocalCredentialContextTokenProvider@fd37933: No role specified and no roles available., com.databricks.backend.daemon.driver.aws.ProxiedIAMCredentialProvider@6d3127a1: User does not have any IAM roles]
	at com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:136)
	at com.databricks.backend.daemon.driver.aws.AwsCredentialContextTokenProvider.getCredentials(AwsCredentialContextTokenProvider.scala:84)
	at shaded.databricks.org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:194)
	... 44 more
File <command-7886099615157681>

 

Iā€™ve already checked my AWS IAM role, and it has all necessary S3 permissions. Could someone help me troubleshoot this issue?

Thanks in advance!

 

3 REPLIES 3

Isi
New Contributor III

hey @messiah 

Probably the problem is in the cluster configuration

When using a Shared Cluster in Databricks, the Instance Profile assigned to the cluster will not be used to authenticate access to AWS resources like S3. This is because Shared Clusters operate in a multi-user environment, where permissions and credentials are tied to the individual user rather than the cluster itself.

To work around this limitation, the best approach is to use External Locations if your workspace is enabled for Unity Catalog. External Locations allow administrators to define and manage access to cloud storage at the Unity Catalog level, ensuring that users can read and write data securely without needing direct access to AWS credentials.

If Unity Catalog is not available or External Locations are not an option, a simple and effective alternative is to use a Single-User Cluster instead of a Shared Cluster. Single-User Clusters operate in an isolated environment where all commands run under the same user identity, allowing the assigned Instance Profile to be applied correctly. This means the cluster will have seamless access to AWS resources without requiring additional authentication mechanisms.

By leveraging External Locations where possible or switching to Single-User Clusters, you can avoid authentication issues while ensuring secure and efficient access to cloud storage in Databricks.

Hope that helps šŸ™‚

messiah
New Contributor II

Hey @Isi 

 

I'm using a single-node cluster with DBR 14.3 LTS, and Unity Catalog is enabled. I also tried DBR 15.x LTS, but the issue persists.

 

I set up the workspace using the quickstart method, which created four Databricks rolesā€”two for Lambda, one for EC2 instance setup, and one for S3. These roles are linked to my workspace. The cluster configuration shows "None" for the instance profile.

 

I also tried enabling credential passthrough, but nothing worked.

 

Is there any other fix for this?

 

 

Sidhant07
Databricks Employee
Databricks Employee

Hi @messiah ,

This occurs due to the lack of AWS credentials or IAM roles necessary to access the S3 bucket.
Can you please check the AWS Credentials, IAM Roles and IAM Permissions: Make sure the IAM role associated with the instance profile has...
...the necessary permissions to access the S3 bucket, such as s3:ListBucket and s3:GetObject.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonā€™t want to miss the chance to attend and share knowledge.

If there isnā€™t a group near you, start one and help create a community that brings people together.

Request a New Group