cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
cancel
Showing results for 
Search instead for 
Did you mean: 

Access Denied 403 error when trying to access data in S3 with dlt pipeline using configured and working instance profile and mounted bucket

horatio
New Contributor II

I can read all of my s3 data without any issues after configuring my cluster with an instance profile however when I try to run the following dlt decorator it gives me an access denied error. Are there some other IAM tweaks I need to make for delta? When looking at the pipeline, it looks like it fails at setting up tables in s3 after the initial read. Note that I also tried to set my storage location to a path in s3 both with s3a:// and /mnt syntax with no luck either. I also noticed that if I set storage to my bucket it hangs on waiting for resources before failing with `DataPlaneException: Failed to start the DLT service on cluster`. Ultimately I would use this with autoloader and cloudFiles but this is a simplified test which should work anyway -- thanks

#this gives me a 403 java.nio.file.AccessDeniedException to the s3 location
import dlt
from pyspark.sql.functions import explode, col
@dlt.table
def rtb_dlt_bids_bronze():
    return (
        spark.read.format("json")
        .option("multiLine", "true")
        .option("inferSchema", "true")
        .load(/mnt/demo/<pathtofile>))

on the other hand this works fine:

display(spark.read.format("json")
         .option("multiLine", "true")
         .option("inferSchema", "true")
         .load("/mnt/demo/<pathtofile>"))
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o772.load.
: java.nio.file.AccessDeniedException: s3a://<pathtofile>: getFileStatus on s3a://<pathtofile>: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden; request: HEAD https://<pathtofile>; {} Hadoop 3.3.1, aws-sdk-java/1.12.189 Linux/5.4.0-1075-aws OpenJDK_64-Bit_Server_VM/25.302-b08 java/1.8.0_302 scala/2.12.14 vendor/Azul_Systems,_Inc. cfg/retry-mode/legacy com.amazonaws.services.s3.model.GetObjectMetadataRequest

3 REPLIES 3

jose_gonzalez
Moderator
Moderator

how do you do your mount point? could you share more details please

Vidula
Honored Contributor

Hi @Robby Kiskanyan​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

BradSheridan
Valued Contributor

@Robby Kiskanyan​ did you ever resolve this? I'm facing the same exact issue right now.

thanks,

Brad

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.