cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Configuring DNS resolution for Private Workspaces Intro  For customers on the E2 Platform, Databricks has a feature that allows them to use AWS Privat...

User16581306062
New Contributor III
New Contributor III

Configuring DNS resolution for Private Workspaces

Intro

For customers on the E2 Platform, Databricks has a feature that allows them to use AWS PrivateLink to provision secure private workspaces by creating VPC endpoints to both the front-end and back-end interfaces of the Databricks infrastructure. The front-end VPC endpoint ensures that users connect to the Databricks web application, REST APIs and JDBC/ODBC interface over their private network. The back-end VPC endpoints ensure that clusters in their own managed VPC connect to the secure cluster connectivity relay and REST APIs over the AWS network backbone.

We previously covered how customers can leverage AWS Route 53 Outbound resolver endpoints to allow workspaces deployed on their own VPC to resolve custom hostnames that can be hosted on customer managed DNS servers. When using PrivateLink for front-end, the workspace URL will need to resolve to the private IP of the PrivateLink interface in order to enable access to the workspace via a private connectivity (from on-premises or other connected VPCs).

In this post I will show how to leverage Route 53 Inbound Endpoints to enable DNS name resolution of workspaces with PrivateLink enabled for the front-end interface. We will also demonstrate how customers using Terraform for managing workspace deployments can add this configuration to their pipeline and automatically make private workspaces accessible over a private network.

Architecture

The following diagram shows how a client on the customer on-premises network sends a request to the corporate DNS server which has a forwarding rule configured for the cloud.databricks.com domain. The DNS query is forwarded to the IP of the Resolver Endpoint in AWS which is associated with the Private Hosted Zone where a record exists with the workspace URL pointing to the private IP of the front-end PrivateLink interface.

imageThe key components on the architecture are:

  1. On-premises corporate DNS server with a forwarding rule for the cloud.databricks.com domain
  2. Private connectivity between the corporate data centre and the AWS VPC. This connectivity can be established using AWS Direct Connect or an IPSec VPN
  3. A Private Hosted Zone (PHZ) in Route 53 for the cloud.databricks.com domain. For each workspace created a new record needs to be added to resolve the workspace name to the PrivateLink interface IP address
  4. Route 53 Resolver Inbound endpoint to look for DNS records on the private hosted zone and provide the response back to the on-premises DNS Server.
  5. Databricks workspaces with PrivateLink for the front-end interface (Web App and REST APIs)

DNS Records

In order for the platform to work properly there are a few records that need to be created in the PHZ. These records will allow clusters to connect to the backend REST APIs and to the Secure Cluster Connectivity relay.  

Additionally, you need to include additional records to be able to continue to resolve public accessible URLs such as "accounts.cloud.databricks.com". This is necessary because the workspace URL on the PrivateLink implementation shares the same domain as the Databricks accounts webpage.

It is strongly recommended that you review the AWS Considerations for hosted private zones document before implementing this feature.

Below you can a few considerations and a summary of the records needed in your PHZ for successful deployment of a workspace:

  1. When you forward the cloud.databricks.com domain to the PHZ in AWS, you need to ensure that all workspaces are registered on that domain
  2. When a workspace is created there are two hostnames associated with it:
    1. The first is the URL used to login to the workspace, such as yourcompany.cloud.databricks.com
    2. The second is a URL for the Spark Driver proxy which is used to access services like Spark UI and Web Terminal. The second domain hast the format dbc-dp-<workspace-id>.cloud.databricks.com
    3. Both of these hosts need to be registered on the PHZ
  3. AWS Route 53 behavior for PHZs is that, if there is a matching PHZ but there is no record that matches the domain name and type in the request, the Resolver doesn't forward the request to a public DNS resolver. Instead, it returns NXDOMAIN (non-existent domain) to the client. This means that you need to add records to your PHZ to resolve the publicly reachable accounts.cloud.databricks.com domain.

The table below summarizes the records required for your PHZ. Please note that the Workspace URL and Spark Driver proxy URL are required for each Databricks workspace.

image 

Terraform Code

When creating a new workspace, the DNS record can be created on your PHZ as part of your CI/CD pipeline. The following code shows how to create the inbound endpoint, the PHZ and the DNS record using Terraform:

# Creates a Inbound Route 53 Resolver endpoint
resource "aws_route53_resolver_endpoint" "listener" {
  name      = "dns-inbound-resolver"
  direction = "INBOUND"
  
  security_group_ids = [
    aws_security_group.dns-sg.id
  ]
 
  ip_address {
    subnet_id = aws_subnet.mysubnet-1.id
  }
 
  ip_address {
    subnet_id = aws_subnet.mysubnet-2.id
  }
}
 
# Creates Private Hosted Zone for the Databricks domain
resource "aws_route53_zone" "databricks" {
  name = "cloud.databricks.com"
  vpc {
    vpc_id = aws_vpc.myvpc.id
  }
}
 
# Creates data source for the Databricks workspace front-end PrivateLink interface. 
data "aws_network_interface" "workspace" {
  for_each = aws_vpc_endpoint.workspace.network_interface_ids
  id       = each.value
}
 
# Creates the DNS record using the PrivateLink interface IP and the FQDN of the workspace
resource "aws_route53_record" "workspace" {
  for_each = data.aws_network_interface.workspace
  zone_id = aws_route53_zone.databricks.zone_id
  name    = databricks_mws_workspaces.workspace.workspace_url
  type    = "A"
  ttl     = 300
  records = [each.value.private_ip]
}

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @Alysson Souza​, Thank you for the valuable content you've been contributing to our community. Your well-researched insights and thought-provoking discussions have greatly benefited our community, inspiring us to grow and learn together.

Your dedication to sharing your knowledge is truly appreciated. Keep up the great work, and I'm looking forward to your future contributions!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.