cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Accessing Azure Databricks Workspace via Private Endpoint and On-Premises Proxy

ittzzmalind
New Contributor II

Public access to the Azure Databricks workspace is currently disabled. Access is required through a Private Link (private endpoint – api_ui).

A private endpoint has already been configured successfully:

  • Virtual Network: Vnet-PE-ENDPOINT
  • Subnet: Snet-PE-ENDPOINT
  • Private Link connection to the Databricks workspace is established
  • Connectivity from this VNet to the workspace has been tested and is working as expected (ns lookup and list cluster from a test vm )

New Requirement

An application hosted on a VM in a different Azure VNet needs to access the Databricks workspace. However, the access must be routed through an on-premises proxy server.

--->

  1. How can this architecture be configured to enable secure connectivity?
  2. What configuration is required between:
    • The on-premises proxy and Azure Databricks (via Private Endpoint)?
    • The VM VNet and the on-premises proxy?
  3. What networking component should be used to enable this flow?
  4. How can we ensure that the VM ultimately accesses the Databricks workspace via the private endpoint only, without exposing public access?

End Goal

The VM hosted in a separate Azure VNet should be able to securely access the Azure Databricks workspace through the on-premises proxy, while ensuring that all traffic is routed via the private endpoint.

1 ACCEPTED SOLUTION

Accepted Solutions

anuj_lathi
Databricks Employee
Databricks Employee

This is a classic hub-spoke + on-premises hybrid networking scenario. Here's how to architect it end-to-end.

Architecture Overview

The traffic flow will be:

VM (VNet-App) --> ExpressRoute/VPN Gateway --> On-Prem Proxy Server --> ExpressRoute/VPN Gateway --> VNet-PE-ENDPOINT --> Private Endpoint --> Azure Databricks

 

Step 1: Network Connectivity Between VNets and On-Premises

You need two connectivity paths -- both going through your on-premises network:

VM VNet (VNet-App) to On-Premises:

  • Configure an ExpressRoute circuit or Site-to-Site VPN Gateway in VNet-App (or a hub VNet peered to VNet-App)
  • This allows the VM to route traffic to the on-premises proxy

On-Premises to Private Endpoint VNet (VNet-PE-ENDPOINT):

  • Configure an ExpressRoute circuit or Site-to-Site VPN Gateway in VNet-PE-ENDPOINT (or a hub VNet peered to it)
  • This allows the on-premises proxy to reach the private endpoint's private IP

Recommended: Hub-Spoke Topology

Rather than connecting each VNet individually, use a hub VNet with a single ExpressRoute/VPN gateway:

VNet-App (spoke) ---peering---> Hub VNet <---peering--- VNet-PE-ENDPOINT (spoke)

                                   |

                            ExpressRoute/VPN

                                   |

                           On-Premises Network

                           (Proxy Server here)

 

Enable "Allow Gateway Transit" on the hub peering and "Use Remote Gateway" on each spoke peering so all spokes can use the hub's gateway.

Step 2: Configure the On-Premises Proxy Server

The proxy server (e.g., Squid, nginx, or an enterprise proxy like Zscaler/Blue Coat) must be configured to:

Allow HTTPS traffic to Databricks endpoints:

  • Your workspace URL: adb-xxxxxxxxxxxx.xx.azuredatabricks.net (port 443)
  • Browser auth URL: region.pl-auth.azuredatabricks.net (port 443)
  • Additional ports if needed: 6666, 3306, 8443-8451

Forward traffic toward the Azure private endpoint IP:

  • The proxy must resolve the Databricks workspace URL to the private IP of the private endpoint (not the public IP)
  • This requires proper DNS configuration (see Step 3)

Proxy configuration example (Squid):

# Allow Databricks workspace traffic

acl databricks_hosts dstdomain .azuredatabricks.net

http_access allow databricks_hosts

 

Step 3: DNS Configuration (Critical)

This is the most important step. The proxy server must resolve Databricks URLs to private IPs, not public IPs.

Option A: Conditional DNS Forwarding (Recommended)

On your on-premises DNS server, configure a conditional forwarder:

  • Zone: privatelink.azuredatabricks.net
  • Forward to: An Azure DNS Forwarder (a VM or Azure Firewall DNS proxy in your hub VNet)

Important: Do NOT forward directly to 168.63.129.16 -- this Azure DNS IP only responds to queries from within Azure VNets. You need an intermediary forwarder.

Azure DNS Forwarder options:

  • A small Windows/Linux VM running DNS forwarding (e.g., BIND, Windows DNS, dnsmasq)
  • Azure Firewall with DNS proxy enabled
  • Azure DNS Private Resolver (managed service, no VM needed)

Option B: Manual DNS A Records

If conditional forwarding isn't possible, create static A records on your on-premises DNS:

adb-xxxxxxxxxxxx.xx.azuredatabricks.net  -->  10.x.x.x  (private endpoint IP)

region.pl-auth.azuredatabricks.net       -->  10.x.x.x  (same private endpoint IP)

 

Find the private IP from: Azure Portal > Private Endpoint > Network Interface > IP Configuration

Note: Do NOT override accounts.azuredatabricks.net -- the Account Console must resolve publicly.

Step 4: VM Proxy Configuration

Configure the VM in VNet-App to route Databricks traffic through the on-premises proxy:

Environment variables (Linux):

export HTTPS_PROXY=http://proxy.onprem.company.com:8080

export HTTP_PROXY=http://proxy.onprem.company.com:8080

export NO_PROXY=169.254.169.254,168.63.129.16

 

For Databricks CLI or API calls:

export HTTPS_PROXY=http://proxy.onprem.company.com:8080

databricks clusters list --profile my-workspace

 

For application code (Python example):

import os

os.environ['HTTPS_PROXY'] = 'http://proxy.onprem.company.com:8080'

 

Step 5: NSG and Firewall Rules

Ensure Network Security Groups allow the required traffic:

VNet-App NSG (outbound):

  • Allow TCP 443 outbound to on-premises proxy IP

On-premises firewall:

  • Allow the proxy to reach VNet-PE-ENDPOINT subnet on TCP 443

VNet-PE-ENDPOINT NSG (inbound to private endpoint subnet):

  • Allow TCP 443, 6666, 3306, 8443-8451 from on-premises network CIDR

Step 6: Verify End-to-End Connectivity

From the on-premises proxy server:

nslookup adb-xxxxxxxxxxxx.xx.azuredatabricks.net

# Should resolve to the private IP (e.g., 10.x.x.x)

 

From the VM (through proxy):

curl -x http://proxy.onprem.company.com:8080 \

  https://adb-xxxxxxxxxxxx.xx.azuredatabricks.net/api/2.0/clusters/list \

  -H "Authorization: Bearer <token>"

 

Summary Checklist

 

Component

Configuration

VNet-App to On-Prem

ExpressRoute or VPN (via hub VNet peering)

On-Prem to VNet-PE-ENDPOINT

ExpressRoute or VPN (via hub VNet peering)

DNS Resolution

Conditional forwarder for privatelink.azuredatabricks.net to Azure DNS Forwarder

Proxy Server

Allow *.azuredatabricks.net on port 443

VM Proxy Config

Set HTTPS_PROXY environment variable

NSGs

Allow 443 (and 6666, 3306, 8443-8451) between all hops

Validation

nslookup from proxy + curl from VM through proxy

Key Gotcha

The most common failure is DNS resolution. If the proxy resolves the Databricks URL to a public IP instead of the private endpoint IP, the connection will fail because public access is disabled. Always verify with nslookup from the proxy server itself.

References

Anuj Lathi
Solutions Engineer @ Databricks

View solution in original post

1 REPLY 1

anuj_lathi
Databricks Employee
Databricks Employee

This is a classic hub-spoke + on-premises hybrid networking scenario. Here's how to architect it end-to-end.

Architecture Overview

The traffic flow will be:

VM (VNet-App) --> ExpressRoute/VPN Gateway --> On-Prem Proxy Server --> ExpressRoute/VPN Gateway --> VNet-PE-ENDPOINT --> Private Endpoint --> Azure Databricks

 

Step 1: Network Connectivity Between VNets and On-Premises

You need two connectivity paths -- both going through your on-premises network:

VM VNet (VNet-App) to On-Premises:

  • Configure an ExpressRoute circuit or Site-to-Site VPN Gateway in VNet-App (or a hub VNet peered to VNet-App)
  • This allows the VM to route traffic to the on-premises proxy

On-Premises to Private Endpoint VNet (VNet-PE-ENDPOINT):

  • Configure an ExpressRoute circuit or Site-to-Site VPN Gateway in VNet-PE-ENDPOINT (or a hub VNet peered to it)
  • This allows the on-premises proxy to reach the private endpoint's private IP

Recommended: Hub-Spoke Topology

Rather than connecting each VNet individually, use a hub VNet with a single ExpressRoute/VPN gateway:

VNet-App (spoke) ---peering---> Hub VNet <---peering--- VNet-PE-ENDPOINT (spoke)

                                   |

                            ExpressRoute/VPN

                                   |

                           On-Premises Network

                           (Proxy Server here)

 

Enable "Allow Gateway Transit" on the hub peering and "Use Remote Gateway" on each spoke peering so all spokes can use the hub's gateway.

Step 2: Configure the On-Premises Proxy Server

The proxy server (e.g., Squid, nginx, or an enterprise proxy like Zscaler/Blue Coat) must be configured to:

Allow HTTPS traffic to Databricks endpoints:

  • Your workspace URL: adb-xxxxxxxxxxxx.xx.azuredatabricks.net (port 443)
  • Browser auth URL: region.pl-auth.azuredatabricks.net (port 443)
  • Additional ports if needed: 6666, 3306, 8443-8451

Forward traffic toward the Azure private endpoint IP:

  • The proxy must resolve the Databricks workspace URL to the private IP of the private endpoint (not the public IP)
  • This requires proper DNS configuration (see Step 3)

Proxy configuration example (Squid):

# Allow Databricks workspace traffic

acl databricks_hosts dstdomain .azuredatabricks.net

http_access allow databricks_hosts

 

Step 3: DNS Configuration (Critical)

This is the most important step. The proxy server must resolve Databricks URLs to private IPs, not public IPs.

Option A: Conditional DNS Forwarding (Recommended)

On your on-premises DNS server, configure a conditional forwarder:

  • Zone: privatelink.azuredatabricks.net
  • Forward to: An Azure DNS Forwarder (a VM or Azure Firewall DNS proxy in your hub VNet)

Important: Do NOT forward directly to 168.63.129.16 -- this Azure DNS IP only responds to queries from within Azure VNets. You need an intermediary forwarder.

Azure DNS Forwarder options:

  • A small Windows/Linux VM running DNS forwarding (e.g., BIND, Windows DNS, dnsmasq)
  • Azure Firewall with DNS proxy enabled
  • Azure DNS Private Resolver (managed service, no VM needed)

Option B: Manual DNS A Records

If conditional forwarding isn't possible, create static A records on your on-premises DNS:

adb-xxxxxxxxxxxx.xx.azuredatabricks.net  -->  10.x.x.x  (private endpoint IP)

region.pl-auth.azuredatabricks.net       -->  10.x.x.x  (same private endpoint IP)

 

Find the private IP from: Azure Portal > Private Endpoint > Network Interface > IP Configuration

Note: Do NOT override accounts.azuredatabricks.net -- the Account Console must resolve publicly.

Step 4: VM Proxy Configuration

Configure the VM in VNet-App to route Databricks traffic through the on-premises proxy:

Environment variables (Linux):

export HTTPS_PROXY=http://proxy.onprem.company.com:8080

export HTTP_PROXY=http://proxy.onprem.company.com:8080

export NO_PROXY=169.254.169.254,168.63.129.16

 

For Databricks CLI or API calls:

export HTTPS_PROXY=http://proxy.onprem.company.com:8080

databricks clusters list --profile my-workspace

 

For application code (Python example):

import os

os.environ['HTTPS_PROXY'] = 'http://proxy.onprem.company.com:8080'

 

Step 5: NSG and Firewall Rules

Ensure Network Security Groups allow the required traffic:

VNet-App NSG (outbound):

  • Allow TCP 443 outbound to on-premises proxy IP

On-premises firewall:

  • Allow the proxy to reach VNet-PE-ENDPOINT subnet on TCP 443

VNet-PE-ENDPOINT NSG (inbound to private endpoint subnet):

  • Allow TCP 443, 6666, 3306, 8443-8451 from on-premises network CIDR

Step 6: Verify End-to-End Connectivity

From the on-premises proxy server:

nslookup adb-xxxxxxxxxxxx.xx.azuredatabricks.net

# Should resolve to the private IP (e.g., 10.x.x.x)

 

From the VM (through proxy):

curl -x http://proxy.onprem.company.com:8080 \

  https://adb-xxxxxxxxxxxx.xx.azuredatabricks.net/api/2.0/clusters/list \

  -H "Authorization: Bearer <token>"

 

Summary Checklist

 

Component

Configuration

VNet-App to On-Prem

ExpressRoute or VPN (via hub VNet peering)

On-Prem to VNet-PE-ENDPOINT

ExpressRoute or VPN (via hub VNet peering)

DNS Resolution

Conditional forwarder for privatelink.azuredatabricks.net to Azure DNS Forwarder

Proxy Server

Allow *.azuredatabricks.net on port 443

VM Proxy Config

Set HTTPS_PROXY environment variable

NSGs

Allow 443 (and 6666, 3306, 8443-8451) between all hops

Validation

nslookup from proxy + curl from VM through proxy

Key Gotcha

The most common failure is DNS resolution. If the proxy resolves the Databricks URL to a public IP instead of the private endpoint IP, the connection will fail because public access is disabled. Always verify with nslookup from the proxy server itself.

References

Anuj Lathi
Solutions Engineer @ Databricks