cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

AutoLoader File Notification Setup on AWS

Olaoye_Somide
New Contributor III

Iโ€™m encountering issues setting up Databricks AutoLoader in File Notification mode. The error seems to be related to UC access to the S3 bucket. I have tried running it on a single-node dedicated cluster but no luck.

Any guidance or assistance on resolving this issue would be greatly appreciated.

Documents referenced for setup: 

 

Below is the error message received:

 

Py4JJavaError: An error occurred while calling o415.load.
: java.nio.file.AccessDeniedException: s3://bucket_name/folder: getFileStatus on s3://bucket_name/folder: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden; request: HEAD https://bucket_name.s3.eu-central-1.amazonaws.com folder {} Hadoop 3.3.6, aws-sdk-java/1.12.390 Linux/5.15.0-1063-aws OpenJDK_64-Bit_Server_VM/25.392-b08 java/1.8.0_392 scala/2.12.15 kotlin/1.6.0 vendor/Azul_Systems,_Inc. cfg/retry-mode/legacy com.amazonaws.services.s3.model.GetObjectMetadataRequest; Request ID: 7FH7VQPTTBFCER18, Extended Request ID: sRHyEyURC221EulMHsMHTxZzK0R1TabG9vPgPV2vl1GsWSSoYwuJxriQYTZxxTMgvJKmlFM/D4KH7x9SZU6pMGDU9Wojk+rYqX+MnajfxEQ=, Cloud Provider: AWS, Instance ID: i-0ff777fafb0f546c9 credentials-provider: com.amazonaws.auth.BasicSessionCredentials credential-header: AWS4-HMAC-SHA256 Credential=ASIA5X45VTLXYJ24XYPS/20240718/eu-central-1/s3/aws4_request signature-present: true (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 7FH7VQPTTBFCER18; S3 Extended Request ID: sRHyEyURC221EulMHsMHTxZzK0R1TabG9vPgPV2vl1GsWSSoYwuJxriQYTZxxTMgvJKmlFM/D4KH7x9SZU6pMGDU9Wojk+rYqX+MnajfxEQ=; Proxy: null), S3 Extended Request ID: sRHyEyURC221EulMHsMHTxZzK0R1TabG9vPgPV2vl1GsWSSoYwuJxriQYTZxxTMgvJKmlFM/D4KH7x9SZU6pMGDU9Wojk+rYqX+MnajfxEQ=:403 Forbidden
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:292)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:197)

 

 

Autoloader Script:

 

try:
    (
        spark.readStream.format("cloudFiles")
        .option("cloudFiles.format", "json")
        .option("cloudFiles.schemaLocation", schema_path)
        .option("cloudFiles.schemaEvolutionMode", "addNewColumns")
        .option("cloudFiles.useNotifications", "true")
        .option("cloudFiles.region", "eu-central-1")
        .option(
            "cloudFiles.queueUrl",
            "https://sqs.eu-central-1.amazonaws.com/XXXXXX/databricks-auto-ingest-test",
        )        
        .load(f"s3://{bucket_name}/{bucket_prefix}")
        .writeStream.option("checkpointLocation", checkpoint_path)
        .option("mergeSchema", "true")
        .trigger(availableNow=True)
        .toTable(f"{catalog_name}.{schema_name}.{delta_table_name}")
    )
except Exception as e:
    raise e

 

 

IAM Policy attached to IAM Role / Instance profile:

 

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "DatabricksAutoLoaderSetup",
            "Effect": "Allow",
            "Action": [
                "s3:GetBucketNotification",
                "s3:PutBucketNotification",
                "s3:ListBucket",
                "s3:GetBucketLocation",
                "s3:GetLifecycleConfiguration",
                "s3:PutLifecycleConfiguration",
                "sns:ListSubscriptionsByTopic",
                "sns:GetTopicAttributes",
                "sns:SetTopicAttributes",
                "sns:CreateTopic",
                "sns:TagResource",
                "sns:Publish",
                "sns:Subscribe",
                "sqs:CreateQueue",
                "sqs:DeleteMessage",
                "sqs:ReceiveMessage",
                "sqs:SendMessage",
                "sqs:GetQueueUrl",
                "sqs:GetQueueAttributes",
                "sqs:SetQueueAttributes",
                "sqs:TagQueue",
                "sqs:ChangeMessageVisibility"
            ],
            "Resource": [
                "arn:aws:s3:::bucket_name",
                "arn:aws:sqs:eu-central-1:XXXXX:databricks-auto-ingest-*",
                "arn:aws:sns:eu-central-1:XXXXX:databricks-auto-ingest-*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject",
                "s3:PutObjectAcl"
            ],
            "Resource": [
                "arn:aws:s3:::bucket_name/*"
            ]
        },
        {
            "Sid": "DatabricksAutoLoaderList",
            "Effect": "Allow",
            "Action": [
                "sqs:ListQueues",
                "sqs:ListQueueTags",
                "sns:ListTopics"
            ],
            "Resource": "*"
        },
        {
            "Sid": "DatabricksAutoLoaderTeardown",
            "Effect": "Allow",
            "Action": [
                "sns:Unsubscribe",
                "sns:DeleteTopic",
                "sqs:DeleteQueue"
            ],
            "Resource": [
                "arn:aws:sqs:eu-central-1:XXXX:databricks-auto-ingest-*",
                "arn:aws:sns:eu-central-1:XXXX:databricks-auto-ingest-*"
            ]
        }
    ]
}

 

 

1 REPLY 1

Olaoye_Somide
New Contributor III

Thanks @Retired_mod

I have reviewed all the steps mentioned, including the IAM policy, as per the setup guide. I believe the permissions granted are sufficient for the setup.

To validate the permissions, I used IAM credentials with Admin privileges in the script and still encountered the same error.

Could this issue be related to the UC storage credentials? Do you have any other suggestions on what to try?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group