<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: AutoLoader File Notification Setup on AWS in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/autoloader-file-notification-setup-on-aws/m-p/80154#M35951</link>
    <description>&lt;P&gt;Thanks&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have reviewed all the steps mentioned, including the IAM policy, as per the setup guide. I believe the permissions granted are sufficient for the setup.&lt;/P&gt;&lt;P&gt;To validate the permissions, I used IAM credentials with Admin privileges in the script and still encountered the same error.&lt;/P&gt;&lt;P&gt;Could this issue be related to the UC storage credentials? Do you have any other suggestions on what to try?&lt;/P&gt;</description>
    <pubDate>Tue, 23 Jul 2024 14:03:48 GMT</pubDate>
    <dc:creator>Olaoye_Somide</dc:creator>
    <dc:date>2024-07-23T14:03:48Z</dc:date>
    <item>
      <title>AutoLoader File Notification Setup on AWS</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-file-notification-setup-on-aws/m-p/79279#M35719</link>
      <description>&lt;P class=""&gt;I’m encountering issues setting up Databricks AutoLoader in File Notification mode. The error seems to be related to UC access to the S3 bucket. I have tried running it on a single-node dedicated cluster but no luck.&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;&lt;EM&gt;Any guidance or assistance on resolving this issue would be greatly appreciated.&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P class=""&gt;Documents referenced for setup:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;A href="https://docs.databricks.com/en/ingestion/auto-loader/file-notification-mode.html" target="_blank" rel="noopener nofollow noreferrer"&gt;https://docs.databricks.com/en/ingestion/auto-loader/file-notification-mode.html&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;&lt;A href="https://medium.com/@mattwinmill88/deploying-a-databricks-aws-end-to-end-pipeline-using-terraform-921c6b4a36a7" target="_blank" rel="noopener"&gt;https://medium.com/@mattwinmill88/deploying-a-databricks-aws-end-to-end-pipeline-using-terraform-921c6b4a36a7&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;&lt;A href="https://community.databricks.com/t5/data-engineering/autoloader-file-notification-mode-error-using-uc/m-p/77481#M35438" target="_blank" rel="noopener"&gt;https://community.databricks.com/t5/data-engineering/autoloader-file-notification-mode-error-using-uc/m-p/77481#M35438&lt;/A&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P class=""&gt;&amp;nbsp;&lt;/P&gt;&lt;P class=""&gt;Below is the error message received:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;Py4JJavaError: An error occurred while calling o415.load.
: java.nio.file.AccessDeniedException: s3://bucket_name/folder: getFileStatus on s3://bucket_name/folder: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden; request: HEAD https://bucket_name.s3.eu-central-1.amazonaws.com folder {} Hadoop 3.3.6, aws-sdk-java/1.12.390 Linux/5.15.0-1063-aws OpenJDK_64-Bit_Server_VM/25.392-b08 java/1.8.0_392 scala/2.12.15 kotlin/1.6.0 vendor/Azul_Systems,_Inc. cfg/retry-mode/legacy com.amazonaws.services.s3.model.GetObjectMetadataRequest; Request ID: 7FH7VQPTTBFCER18, Extended Request ID: sRHyEyURC221EulMHsMHTxZzK0R1TabG9vPgPV2vl1GsWSSoYwuJxriQYTZxxTMgvJKmlFM/D4KH7x9SZU6pMGDU9Wojk+rYqX+MnajfxEQ=, Cloud Provider: AWS, Instance ID: i-0ff777fafb0f546c9 credentials-provider: com.amazonaws.auth.BasicSessionCredentials credential-header: AWS4-HMAC-SHA256 Credential=ASIA5X45VTLXYJ24XYPS/20240718/eu-central-1/s3/aws4_request signature-present: true (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 7FH7VQPTTBFCER18; S3 Extended Request ID: sRHyEyURC221EulMHsMHTxZzK0R1TabG9vPgPV2vl1GsWSSoYwuJxriQYTZxxTMgvJKmlFM/D4KH7x9SZU6pMGDU9Wojk+rYqX+MnajfxEQ=; Proxy: null), S3 Extended Request ID: sRHyEyURC221EulMHsMHTxZzK0R1TabG9vPgPV2vl1GsWSSoYwuJxriQYTZxxTMgvJKmlFM/D4KH7x9SZU6pMGDU9Wojk+rYqX+MnajfxEQ=:403 Forbidden
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:292)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:197)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P class=""&gt;Autoloader Script:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;try:
    (
        spark.readStream.format("cloudFiles")
        .option("cloudFiles.format", "json")
        .option("cloudFiles.schemaLocation", schema_path)
        .option("cloudFiles.schemaEvolutionMode", "addNewColumns")
        .option("cloudFiles.useNotifications", "true")
        .option("cloudFiles.region", "eu-central-1")
        .option(
            "cloudFiles.queueUrl",
            "https://sqs.eu-central-1.amazonaws.com/XXXXXX/databricks-auto-ingest-test",
        )        
        .load(f"s3://{bucket_name}/{bucket_prefix}")
        .writeStream.option("checkpointLocation", checkpoint_path)
        .option("mergeSchema", "true")
        .trigger(availableNow=True)
        .toTable(f"{catalog_name}.{schema_name}.{delta_table_name}")
    )
except Exception as e:
    raise e&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;IAM Policy attached to IAM Role / Instance profile:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "DatabricksAutoLoaderSetup",
            "Effect": "Allow",
            "Action": [
                "s3:GetBucketNotification",
                "s3:PutBucketNotification",
                "s3:ListBucket",
                "s3:GetBucketLocation",
                "s3:GetLifecycleConfiguration",
                "s3:PutLifecycleConfiguration",
                "sns:ListSubscriptionsByTopic",
                "sns:GetTopicAttributes",
                "sns:SetTopicAttributes",
                "sns:CreateTopic",
                "sns:TagResource",
                "sns:Publish",
                "sns:Subscribe",
                "sqs:CreateQueue",
                "sqs:DeleteMessage",
                "sqs:ReceiveMessage",
                "sqs:SendMessage",
                "sqs:GetQueueUrl",
                "sqs:GetQueueAttributes",
                "sqs:SetQueueAttributes",
                "sqs:TagQueue",
                "sqs:ChangeMessageVisibility"
            ],
            "Resource": [
                "arn:aws:s3:::bucket_name",
                "arn:aws:sqs:eu-central-1:XXXXX:databricks-auto-ingest-*",
                "arn:aws:sns:eu-central-1:XXXXX:databricks-auto-ingest-*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject",
                "s3:PutObjectAcl"
            ],
            "Resource": [
                "arn:aws:s3:::bucket_name/*"
            ]
        },
        {
            "Sid": "DatabricksAutoLoaderList",
            "Effect": "Allow",
            "Action": [
                "sqs:ListQueues",
                "sqs:ListQueueTags",
                "sns:ListTopics"
            ],
            "Resource": "*"
        },
        {
            "Sid": "DatabricksAutoLoaderTeardown",
            "Effect": "Allow",
            "Action": [
                "sns:Unsubscribe",
                "sns:DeleteTopic",
                "sqs:DeleteQueue"
            ],
            "Resource": [
                "arn:aws:sqs:eu-central-1:XXXX:databricks-auto-ingest-*",
                "arn:aws:sns:eu-central-1:XXXX:databricks-auto-ingest-*"
            ]
        }
    ]
}&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P class=""&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 18 Jul 2024 16:05:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-file-notification-setup-on-aws/m-p/79279#M35719</guid>
      <dc:creator>Olaoye_Somide</dc:creator>
      <dc:date>2024-07-18T16:05:00Z</dc:date>
    </item>
    <item>
      <title>Re: AutoLoader File Notification Setup on AWS</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-file-notification-setup-on-aws/m-p/80154#M35951</link>
      <description>&lt;P&gt;Thanks&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have reviewed all the steps mentioned, including the IAM policy, as per the setup guide. I believe the permissions granted are sufficient for the setup.&lt;/P&gt;&lt;P&gt;To validate the permissions, I used IAM credentials with Admin privileges in the script and still encountered the same error.&lt;/P&gt;&lt;P&gt;Could this issue be related to the UC storage credentials? Do you have any other suggestions on what to try?&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jul 2024 14:03:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-file-notification-setup-on-aws/m-p/80154#M35951</guid>
      <dc:creator>Olaoye_Somide</dc:creator>
      <dc:date>2024-07-23T14:03:48Z</dc:date>
    </item>
  </channel>
</rss>

