cancel
Showing results for 
Search instead for 
Did you mean: 
Product Platform Updates
Stay informed about the latest updates and enhancements to the Databricks platform. Learn about new features, improvements, and best practices to optimize your data analytics workflow.
cancel
Showing results for 
Search instead for 
Did you mean: 
AlexEsibov
Contributor II
Contributor II

IMPORTANT NOTE: we have delayed this feature rollout by 1 month. Please make these changes by April 15, 2024 instead of March 15, 2024, as previously stated.
----------------------

Databricks’ control plane will soon migrate to using AWS S3 gateway endpoints to access S3 storage, the dedicated solution by AWS for storage access. Action is only required if you use IP-based access rules to restrict access to AWS S3 storage (see below). Failure to take action before March 15, 2024, may lead to communication issues with Databricks, such as unity catalog, S3 commit service, and the file system service. Please read below for additional details.

Background

Some Databricks operations on AWS S3 buckets originate from the Databricks control plane. As a result, today, customers who restrict access to AWS S3 storage must allow access from the Databricks control plane network address translation (NAT) IPs

On March 15, 2024, AWS S3 intra-region calls originating from the Databricks control plane will start using S3 gateway endpoints, rather than Databricks’ NAT IPs, as it is the dedicated and scalable solution by AWS for storage access. Therefore, customers who restrict access to AWS S3 storage must also allow access from the S3 gateway endpoints before March 15, 2024. 

Once the migration to use S3 gateways is completed by Databricks, the Databricks control plane NAT IPs will become obsolete for intra-region communications. Note that if the S3 storage is in a different region than the Databricks control plane, communication will still go over a NAT gateway and therefore will continue to use NAT IPs. If your Databricks control plane and S3 bucket are in the same region and you plan to remove the Databricks control plane NAT IPs from your S3 access rules, please allow until May 15, 2024 before doing so. 

Action Required

If you do not have IP access rules to restrict access from the Databricks control plane NAT IPs to AWS S3 buckets, there is no action required.

If you have one or more access policies for S3 storage that includes a condition for NAT IPs, you must update your policy to also include Databricks’ VPC IDs for these S3 gateway endpoints. Step-by-step instructions, sample policy updates, and resources to help you make this change and an example of the S3 policy can be found below. 

Step-by-step instructions can be found in AWS documentation here and are summarized below for convenience.

  1. Sign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/.
  2. Select Buckets in the left-hand navigation
  3. In the Buckets list, choose the name of the bucket that you want to create a bucket policy for.
  4. Choose the Permissions tab.
  5. Under Bucket policy, choose Edit. The Edit bucket policy page appears.
  6. On the Edit bucket policy page, create a policy with the relevant VPC IDs (sample policy and link to VPC IDs are available below).
  7. Choose Save changes

Sample policy update:

 

 

 

 

Sample policy update:
Current config policy that includes Databricks' NAT IPs
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowDatabricks",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:ListBucket",
        "s3:GetBucketLocation"
      ],
      "Principal": "*",
       "Resource": [
        "arn:aws:s3:::<bucket_name>/*",
        "arn:aws:s3:::<bucket_name>"
      ],
      "Condition": {
        "ArnEquals": {
          "aws:PrincipalArn": "<role arn>"
        },
        "IpAddress": {
          "aws:SourceIp": "<databricks ip block>"
        }
      }
    }
  ]
}


Updated config policy to allow traffic from the Databricks Control Plane VPC IDs & NAT IPs.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowDatabricks-public-ip",
            "Effect": "Allow",
            "Principal": "*",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject",
                "s3:ListBucket",
                "s3:GetBucketLocation"
            ],
            "Resource": [
                "arn:aws:s3:::<bucket_name>/*",
                "arn:aws:s3:::<bucket_name>"
            ],
            "Condition": {
      "ArnEquals": {
        "aws:PrincipalArn": "<role arn>"
      },
                "IpAddress": {
                    "aws:SourceIp": "<databricks ip block>"
                }
            }
        },
        {
            "Sid": "AllowDatabricks-s3-gateway",
            "Effect": "Allow",
            "Principal": "*",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject",
                "s3:ListBucket",
                "s3:GetBucketLocation"
            ],
            "Resource": [
                "arn:aws:s3:::<bucket_name>/*",
                "arn:aws:s3:::<bucket_name>"
            ],
            "Condition": {
      "ArnEquals": {
        "aws:PrincipalArn": "<role arn>"
      },
                "StringEquals": {
                    "aws:SourceVPC": "<databricks VPC>"
                }
            }
        }
    ]
}

 

 

 

 

Other resources:

  • Databricks public documentation on restricting access to AWS S3 buckets can be found here
  • Databricks public documentation for the VPC IDs that must be allow-listed can be found here

Please contact help@databricks.com with any questions about this change.

Thank you,

Databricks

3 Comments