cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Access Denied 403 error when trying to access data in S3 with dlt pipeline using configured and working instance profile and mounted bucket

horatio
New Contributor II

I can read all of my s3 data without any issues after configuring my cluster with an instance profile however when I try to run the following dlt decorator it gives me an access denied error. Are there some other IAM tweaks I need to make for delta? When looking at the pipeline, it looks like it fails at setting up tables in s3 after the initial read. Note that I also tried to set my storage location to a path in s3 both with s3a:// and /mnt syntax with no luck either. I also noticed that if I set storage to my bucket it hangs on waiting for resources before failing with `DataPlaneException: Failed to start the DLT service on cluster`. Ultimately I would use this with autoloader and cloudFiles but this is a simplified test which should work anyway -- thanks

#this gives me a 403 java.nio.file.AccessDeniedException to the s3 location
import dlt
from pyspark.sql.functions import explode, col
@dlt.table
def rtb_dlt_bids_bronze():
    return (
        spark.read.format("json")
        .option("multiLine", "true")
        .option("inferSchema", "true")
        .load(/mnt/demo/<pathtofile>))

on the other hand this works fine:

display(spark.read.format("json")
         .option("multiLine", "true")
         .option("inferSchema", "true")
         .load("/mnt/demo/<pathtofile>"))
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o772.load.
: java.nio.file.AccessDeniedException: s3a://<pathtofile>: getFileStatus on s3a://<pathtofile>: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden; request: HEAD https://<pathtofile>; {} Hadoop 3.3.1, aws-sdk-java/1.12.189 Linux/5.4.0-1075-aws OpenJDK_64-Bit_Server_VM/25.302-b08 java/1.8.0_302 scala/2.12.14 vendor/Azul_Systems,_Inc. cfg/retry-mode/legacy com.amazonaws.services.s3.model.GetObjectMetadataRequest

3 REPLIES 3

jose_gonzalez
Databricks Employee
Databricks Employee

how do you do your mount point? could you share more details please

Vidula
Honored Contributor

Hi @Robby Kiskanyan​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

BradSheridan
Valued Contributor

@Robby Kiskanyan​ did you ever resolve this? I'm facing the same exact issue right now.

thanks,

Brad

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group