Unable to Read Data from S3 in Databricks (AWS Free Trial)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago - last edited 3 weeks ago
Hey Community,
I recently signed up for a Databricks free trial on AWS and created a workspace using the quickstart method. After setting up my cluster and opening a notebook, I tried to read a Parquet file from S3 using:
spark.read.parquet("s3://<bucket-name>/path/")
However, Iām getting the following error:
Py4JJavaError: An error occurred while calling o408.parquet.
: java.nio.file.AccessDeniedException: s3://databricks-workspace-stack-d3546-bucket/parquet-samples: shaded.databricks.org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by AwsCredentialContextTokenProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from any provider in the chain: [com.databricks.backend.daemon.driver.aws.AwsLocalCredentialContextTokenProvider@fd37933: No role specified and no roles available., com.databricks.backend.daemon.driver.aws.ProxiedIAMCredentialProvider@6d3127a1: User does not have any IAM roles]
at shaded.databricks.org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:249)
at shaded.databricks.org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:197)
at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:4141)
at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:4067)
at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:3947)
at com.databricks.common.filesystem.LokiS3FS.getFileStatusNoCache(LokiS3FS.scala:84)
at com.databricks.common.filesystem.LokiS3FS.getFileStatus(LokiS3FS.scala:74)
at com.databricks.common.filesystem.LokiFileSystem.getFileStatus(LokiFileSystem.scala:272)
at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1880)
at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:60)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:416)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:389)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:345)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:345)
at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:866)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199)
at py4j.ClientServerConnection.run(ClientServerConnection.java:119)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: shaded.databricks.org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by AwsCredentialContextTokenProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from any provider in the chain: [com.databricks.backend.daemon.driver.aws.AwsLocalCredentialContextTokenProvider@fd37933: No role specified and no roles available., com.databricks.backend.daemon.driver.aws.ProxiedIAMCredentialProvider@6d3127a1: User does not have any IAM roles]
at shaded.databricks.org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:239)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1269)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers(AmazonHttpClient.java:845)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:794)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:781)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:755)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:715)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:697)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:561)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:541)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5456)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5403)
at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1372)
at shaded.databricks.org.apache.hadoop.fs.s3a.EnforcingDatabricksS3Client.getObjectMetadata(EnforcingDatabricksS3Client.scala:222)
at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$getObjectMetadata$6(S3AFileSystem.java:2364)
at shaded.databricks.org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:435)
at shaded.databricks.org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:394)
at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:2354)
at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:2322)
at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:4122)
... 25 more
Caused by: com.amazonaws.SdkClientException: Unable to load AWS credentials from any provider in the chain: [com.databricks.backend.daemon.driver.aws.AwsLocalCredentialContextTokenProvider@fd37933: No role specified and no roles available., com.databricks.backend.daemon.driver.aws.ProxiedIAMCredentialProvider@6d3127a1: User does not have any IAM roles]
at com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:136)
at com.databricks.backend.daemon.driver.aws.AwsCredentialContextTokenProvider.getCredentials(AwsCredentialContextTokenProvider.scala:84)
at shaded.databricks.org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:194)
... 44 more
File <command-7886099615157681>
Iāve already checked my AWS IAM role, and it has all necessary S3 permissions. Could someone help me troubleshoot this issue?
Thanks in advance!
- Labels:
-
Spark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
hey @messiah
Probably the problem is in the cluster configuration
When using a Shared Cluster in Databricks, the Instance Profile assigned to the cluster will not be used to authenticate access to AWS resources like S3. This is because Shared Clusters operate in a multi-user environment, where permissions and credentials are tied to the individual user rather than the cluster itself.
To work around this limitation, the best approach is to use External Locations if your workspace is enabled for Unity Catalog. External Locations allow administrators to define and manage access to cloud storage at the Unity Catalog level, ensuring that users can read and write data securely without needing direct access to AWS credentials.
If Unity Catalog is not available or External Locations are not an option, a simple and effective alternative is to use a Single-User Cluster instead of a Shared Cluster. Single-User Clusters operate in an isolated environment where all commands run under the same user identity, allowing the assigned Instance Profile to be applied correctly. This means the cluster will have seamless access to AWS resources without requiring additional authentication mechanisms.
By leveraging External Locations where possible or switching to Single-User Clusters, you can avoid authentication issues while ensuring secure and efficient access to cloud storage in Databricks.
Hope that helps š
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago - last edited 3 weeks ago
Hey @Isi
I'm using a single-node cluster with DBR 14.3 LTS, and Unity Catalog is enabled. I also tried DBR 15.x LTS, but the issue persists.
I set up the workspace using the quickstart method, which created four Databricks rolesātwo for Lambda, one for EC2 instance setup, and one for S3. These roles are linked to my workspace. The cluster configuration shows "None" for the instance profile.
I also tried enabling credential passthrough, but nothing worked.
Is there any other fix for this?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
Hi @messiah ,
s3:ListBucket
and s3:GetObject
.![](/skins/images/B38AF44D4BD6CE643D2A527BE673CCF6/responsive_peak/images/icon_anonymous_message.png)
![](/skins/images/B38AF44D4BD6CE643D2A527BE673CCF6/responsive_peak/images/icon_anonymous_message.png)