IMPORTANT NOTE: we have delayed this feature rollout by 1 month. Please make these changes by April 15, 2024 instead of March 15, 2024, as previously stated.
----------------------
Databricks’ control plane will soon migrate to using AWS S3 gateway endpoints to access S3 storage, the dedicated solution by AWS for storage access. Action is only required if you use IP-based access rules to restrict access to AWS S3 storage (see below). Failure to take action before March 15, 2024, may lead to communication issues with Databricks, such as unity catalog, S3 commit service, and the file system service. Please read below for additional details.
Background
Some Databricks operations on AWS S3 buckets originate from the Databricks control plane. As a result, today, customers who restrict access to AWS S3 storage must allow access from the Databricks control plane network address translation (NAT) IPs.
On March 15, 2024, AWS S3 intra-region calls originating from the Databricks control plane will start using S3 gateway endpoints, rather than Databricks’ NAT IPs, as it is the dedicated and scalable solution by AWS for storage access. Therefore, customers who restrict access to AWS S3 storage must also allow access from the S3 gateway endpoints before March 15, 2024.
Once the migration to use S3 gateways is completed by Databricks, the Databricks control plane NAT IPs will become obsolete for intra-region communications. Note that if the S3 storage is in a different region than the Databricks control plane, communication will still go over a NAT gateway and therefore will continue to use NAT IPs. If your Databricks control plane and S3 bucket are in the same region and you plan to remove the Databricks control plane NAT IPs from your S3 access rules, please allow until May 15, 2024 before doing so.
Action Required
If you do not have IP access rules to restrict access from the Databricks control plane NAT IPs to AWS S3 buckets, there is no action required.
If you have one or more access policies for S3 storage that includes a condition for NAT IPs, you must update your policy to also include Databricks’ VPC IDs for these S3 gateway endpoints. Step-by-step instructions, sample policy updates, and resources to help you make this change and an example of the S3 policy can be found below.
Step-by-step instructions can be found in AWS documentation here and are summarized below for convenience.
Sample policy update:
Sample policy update:
Current config policy that includes Databricks' NAT IPs
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowDatabricks",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Principal": "*",
"Resource": [
"arn:aws:s3:::<bucket_name>/*",
"arn:aws:s3:::<bucket_name>"
],
"Condition": {
"ArnEquals": {
"aws:PrincipalArn": "<role arn>"
},
"IpAddress": {
"aws:SourceIp": "<databricks ip block>"
}
}
}
]
}
Updated config policy to allow traffic from the Databricks Control Plane VPC IDs & NAT IPs.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowDatabricks-public-ip",
"Effect": "Allow",
"Principal": "*",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": [
"arn:aws:s3:::<bucket_name>/*",
"arn:aws:s3:::<bucket_name>"
],
"Condition": {
"ArnEquals": {
"aws:PrincipalArn": "<role arn>"
},
"IpAddress": {
"aws:SourceIp": "<databricks ip block>"
}
}
},
{
"Sid": "AllowDatabricks-s3-gateway",
"Effect": "Allow",
"Principal": "*",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": [
"arn:aws:s3:::<bucket_name>/*",
"arn:aws:s3:::<bucket_name>"
],
"Condition": {
"ArnEquals": {
"aws:PrincipalArn": "<role arn>"
},
"StringEquals": {
"aws:SourceVPC": "<databricks VPC>"
}
}
}
]
}
Other resources:
Please contact help@databricks.com with any questions about this change.
Thank you,
Databricks
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.