Authors: Kiran Anand, Suraj Karuvel
The serverless model provides customers with a fast and easy option to run their workloads in Azure Databricks. In this model, the compute runs in Databricks’ Azure tenant and not in yours. In many cases, you might need this Serverless compute to privately connect to services in your Virtual Networks (VNet) present in your Azure tenant. Apart from this, with more and more workloads moving to Databricks Serverless, customers also require options to connect to their on-premises systems in a secure fashion. As a first step toward this, we need to have private connectivity from the Databricks Serverless compute plane to the customers’ VNet in their Azure tenant. And this connectivity would provide the first-mile of line of sight from Databricks Serverless to customers’ on-premises systems.
In this blog, we will provide a quick overview of a new feature to enable such private connectivity from the Databricks Serverless to the customers’ Azure VNet. For a detailed reference architecture (RA) and step-by-step enablement of this feature, please refer to this documentation.
This blog is an abridged version (TLDR) of the RA provided above, and most of its contents may be duplicated here. If you are new to services like Load Balancer and Private Link, we suggest you refer to the RA for an end-to-end deployment and testing pattern.
The solution described in this blog enables connecting Azure Databricks Serverless compute plane to a standard load balancer (SLB) in the customer’s VNet. In most cases, customers would have already enabled connectivity between their Azure tenant and their on-premises network. So, this solution can be extended further based on the needs.
Note: If the service you are connecting to, is not running in the VM, an IP forwarder should be configured for proper name resolution. The particulars of such a configuration are beyond the scope of this document. Refer to Microsoft Azure documentation for more details on this.
The following are the steps that we will discuss in the sections below:
An internal or private load balancer can provide inbound connectivity to your VMs in private network connectivity scenarios. Refer to this official Azure documentation for complete information on how to create an internal standard load balancer. Quick steps are listed below, where you configure the following:
You must have a virtual network and subnet created, where the SLB can be deployed. It must be the same VNet as where your resource resides.
The private link service on the Azure Load Balancer is required to provide private connectivity to the customer tenant from Databricks Serverless. Each private link service on the load balancer is attached to a separate front-end IP configuration. The private endpoints from Databricks Serverless are linked to this private link service with a specific front-end IP configuration that listens for all incoming traffic. Refer to this official Azure documentation for complete information on how to create a private link service.
The private link service should be in the same region as your standard load balancer. The subnet you choose for the private link service should have the property “disable-private-link-service-network-policies” set to true.
You can skip this step if you plan to use an existing Network Connectivity Config in the same region. Make sure this NCC is attached to the workspace from which you plan to test this connectivity.
Databricks Serverless network connectivity is managed with network connectivity configurations (NCCs). Account admins create NCCs in the account console, and an NCC can be attached to one or more workspaces. When you add a private endpoint in an NCC, Azure Databricks creates a private endpoint request to your Azure resource. Once the request is accepted on the resource side, the private endpoint is used to access resources from the serverless compute plane. The private endpoint is dedicated to your Azure Databricks account and accessible only from authorized workspaces.
To create a new NCC in the account console, follow this official document from Databricks.
As the final step, we create a private endpoint from Databricks NCC to the Private Link service of the standard load balancer that you had created.
Log in to the Azure Databricks account console as an account admin and follow this official documentation from Databricks to perform this step. A few screenshots are provided below for quick reference.
In the Azure Databricks account console, navigate to the Cloud Resources section and select the Network Connectivity Configurations. Then select the chosen NCC object and navigate to the Private endpoint rules tab.
Fill in the Resource ID of the Private link service of the Standard Load Balancer.
Note: This entry is not the Resource ID of the Load Balancer but of the Private link service created on the Load Balancer.
Also, add the necessary domain names that should be used to connect to the resources in your Azure tenant via the Private Link from Databricks Serverless compute plane.
A private endpoint rule will be created in your Databricks NCC to the Private Link Service on your Standard Load Balancer. This private endpoint will initially be in a “pending” state.
In your Azure portal, go to your Private Link service. Choose the private endpoints tab, and then approve the private endpoint connection from Databricks Serverless NCC. As soon as this is done, the private endpoint will switch to an “established” state in your Databricks NCC.
In this blog, we presented a very simple and easy way to enable private connectivity from Azure Databricks Serverless to a VNet in a customer’s Azure tenant. The ability to securely connect from Databricks Serverless to customer resources in Azure via Load Balancer is an important step toward providing a seamless connectivity experience for customers, similar to what they have with Databricks Classic compute. Enabling this first-mile of networking path from Databricks Serverless to the customer’s environment opens up a wider range of connectivity that customers can build upon. For a detailed reference architecture and step-by-step implementation of this new feature, refer to this document.
In subsequent blogs and articles, we will try to present a few such options to extend this connectivity to on-premises customer resources.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.