cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

aws databricks with frontend private link

margarita_shir
New Contributor II

 

In aws databricks documentation, frontend PrivateLink assumes a separate transit VPC connected via Direct Connect/VPN. However, I'm implementing a different architecture using Tailscale for private network access.
My setup: Tailscale subnet router deployed directly within the same VPC as the Databricks workspace (no separate transit VPC) Subnet router advertises the entire VPC CIDR, making all workspace resources accessible to Tailscale clients. Existing backend workspace VPC endpoint already configured for cluster-to-control-plane REST API communication
My question: since my Tailscale subnet router can directly reach the backend endpoint's private IP within the same VPC, could I theoretically reuse this existing workspace endpoint for frontend user access as well instead of creating a separate frontend endpoint?
 

 

1 ACCEPTED SOLUTION

Accepted Solutions

Louis_Frolio
Databricks Employee
Databricks Employee

Hello @margarita_shir 

Short answer: Yesโ€”if your clients can privately reach the existing Databricks โ€œWorkspace (including REST API)โ€ interface endpoint, you can reuse that same VPC endpoint for frontโ€‘end (user) access. You must not try to use the secure cluster connectivity (SCC) relay endpoint for users. The SCC relay is only for compute-to-controlโ€‘plane on port 6666; the โ€œWorkspace (including REST API)โ€ service is the one that serves both the web UI and REST APIs for both frontโ€‘end and backโ€‘end scenarios.

Why this works

  • The Databricks PrivateLink endpoint service named โ€œWorkspace (including REST API)โ€ is used for both frontโ€‘end user access and backโ€‘end REST from compute, so the same service behind your existing VPCE is valid for browsers, CLI, JDBC/ODBC, and tooling over HTTPS. You just need private reachability and the right DNS and Databricks settings. Do not use the SCC relay service for frontโ€‘end. Itโ€™s a different service and port (6666).

  • A โ€œtransit VPCโ€ is the common pattern for frontโ€‘end, but itโ€™s not a hard requirement. Frontโ€‘end PrivateLink endpoint traffic simply needs a private path from clients to the VPCE; your Tailscale subnet router in the workspace VPC satisfies that reachability requirement as long as it routes/advertises the VPCEโ€™s private IPs to clients.

What you need to change to make it work

  • Private Access Settings (PAS): Add the existing Workspace (REST) VPCE registration to the workspaceโ€™s PAS and set the access level so the workspace will accept frontโ€‘end connections from that endpoint (Endpoint or Account as appropriate). This is what authorizes your frontโ€‘end traffic through that VPCE.

  • Internal DNS: Make your workspace URL resolve to the private IP of that same Workspace (REST) VPCE for your Tailscale clients. In practice, configure your internal DNS so the workspace hostname maps to the VPCEโ€™s private IP; Databricks provides regional privatelink hostnames you can map for this purpose. This is the critical step that steers browser/API traffic privately to the endpoint instead of the public internet.

  • IdP redirect (only if using SSO): Add the Databricks โ€œPrivateLink Redirect URIโ€ to your identity provider so browser-based SSO completes over the private path. Keep the original (public) redirect URL if you also have nonโ€‘PrivateLink workspaces.

  • Security groups on the VPCE: Ensure the VPCEโ€™s security group allows inbound/outbound HTTPS (443) from your Tailscale-advertised address space, while still allowing any ports your compute needs for backโ€‘end REST (for example, 8443 for internal control-plane API calls). Databricks recommends separate security groups per endpoint following least privilege, but itโ€™s not required; you can widen the existing SG if thatโ€™s simpler.

  • Registration state: If you originally registered the VPCE only in the โ€œnetwork configurationโ€ for backโ€‘end, you can also reference the same VPCE registration in PAS for frontโ€‘end authorization; registrations are generic. You donโ€™t need to create a second, separate VPCE solely for frontโ€‘end if you can reach the existing one.

Things not to do

  • Donโ€™t try to point users at the SCC relay endpoint; itโ€™s for the compute tunnel only (TCP 6666) and wonโ€™t serve the web UI or REST over HTTPS.

Validation tips

  • DNS test: From a Tailscale client, resolve your workspace hostname and confirm it returns the VPCE private IP you expect (for your regionโ€™s privatelink control-plane domain).

  • Connectivity test: From a Tailscale client, browse to the workspace URL or curl the REST root over HTTPS and verify you reach the UI/API privately; if using SSO, confirm the IdP roundtrip succeeds with the PrivateLink Redirect URI.

When you might still choose a separate frontโ€‘end VPCE

  • Operational isolation: Some teams maintain a distinct frontโ€‘end VPCE (often in a โ€œshared services/transitโ€ VPC) so they can manage different security groups, route tables, and DNS boundaries for user/browser traffic versus compute traffic. This is a bestโ€‘practice pattern but not strictly required for functionality.

In summary: Reusing your existing โ€œWorkspace (including REST API)โ€ VPCE for frontโ€‘end is supported and can work well with your Tailscale-based reachability, provided you update PAS, DNS, IdP (if applicable), and security group rules accordingly. The SCC relay VPCE cannot be reused for frontโ€‘end traffic.

 
Hope these hints/tips are helpful.
Cheers, Louis.

 

View solution in original post

2 REPLIES 2

Louis_Frolio
Databricks Employee
Databricks Employee

Hello @margarita_shir 

Short answer: Yesโ€”if your clients can privately reach the existing Databricks โ€œWorkspace (including REST API)โ€ interface endpoint, you can reuse that same VPC endpoint for frontโ€‘end (user) access. You must not try to use the secure cluster connectivity (SCC) relay endpoint for users. The SCC relay is only for compute-to-controlโ€‘plane on port 6666; the โ€œWorkspace (including REST API)โ€ service is the one that serves both the web UI and REST APIs for both frontโ€‘end and backโ€‘end scenarios.

Why this works

  • The Databricks PrivateLink endpoint service named โ€œWorkspace (including REST API)โ€ is used for both frontโ€‘end user access and backโ€‘end REST from compute, so the same service behind your existing VPCE is valid for browsers, CLI, JDBC/ODBC, and tooling over HTTPS. You just need private reachability and the right DNS and Databricks settings. Do not use the SCC relay service for frontโ€‘end. Itโ€™s a different service and port (6666).

  • A โ€œtransit VPCโ€ is the common pattern for frontโ€‘end, but itโ€™s not a hard requirement. Frontโ€‘end PrivateLink endpoint traffic simply needs a private path from clients to the VPCE; your Tailscale subnet router in the workspace VPC satisfies that reachability requirement as long as it routes/advertises the VPCEโ€™s private IPs to clients.

What you need to change to make it work

  • Private Access Settings (PAS): Add the existing Workspace (REST) VPCE registration to the workspaceโ€™s PAS and set the access level so the workspace will accept frontโ€‘end connections from that endpoint (Endpoint or Account as appropriate). This is what authorizes your frontโ€‘end traffic through that VPCE.

  • Internal DNS: Make your workspace URL resolve to the private IP of that same Workspace (REST) VPCE for your Tailscale clients. In practice, configure your internal DNS so the workspace hostname maps to the VPCEโ€™s private IP; Databricks provides regional privatelink hostnames you can map for this purpose. This is the critical step that steers browser/API traffic privately to the endpoint instead of the public internet.

  • IdP redirect (only if using SSO): Add the Databricks โ€œPrivateLink Redirect URIโ€ to your identity provider so browser-based SSO completes over the private path. Keep the original (public) redirect URL if you also have nonโ€‘PrivateLink workspaces.

  • Security groups on the VPCE: Ensure the VPCEโ€™s security group allows inbound/outbound HTTPS (443) from your Tailscale-advertised address space, while still allowing any ports your compute needs for backโ€‘end REST (for example, 8443 for internal control-plane API calls). Databricks recommends separate security groups per endpoint following least privilege, but itโ€™s not required; you can widen the existing SG if thatโ€™s simpler.

  • Registration state: If you originally registered the VPCE only in the โ€œnetwork configurationโ€ for backโ€‘end, you can also reference the same VPCE registration in PAS for frontโ€‘end authorization; registrations are generic. You donโ€™t need to create a second, separate VPCE solely for frontโ€‘end if you can reach the existing one.

Things not to do

  • Donโ€™t try to point users at the SCC relay endpoint; itโ€™s for the compute tunnel only (TCP 6666) and wonโ€™t serve the web UI or REST over HTTPS.

Validation tips

  • DNS test: From a Tailscale client, resolve your workspace hostname and confirm it returns the VPCE private IP you expect (for your regionโ€™s privatelink control-plane domain).

  • Connectivity test: From a Tailscale client, browse to the workspace URL or curl the REST root over HTTPS and verify you reach the UI/API privately; if using SSO, confirm the IdP roundtrip succeeds with the PrivateLink Redirect URI.

When you might still choose a separate frontโ€‘end VPCE

  • Operational isolation: Some teams maintain a distinct frontโ€‘end VPCE (often in a โ€œshared services/transitโ€ VPC) so they can manage different security groups, route tables, and DNS boundaries for user/browser traffic versus compute traffic. This is a bestโ€‘practice pattern but not strictly required for functionality.

In summary: Reusing your existing โ€œWorkspace (including REST API)โ€ VPCE for frontโ€‘end is supported and can work well with your Tailscale-based reachability, provided you update PAS, DNS, IdP (if applicable), and security group rules accordingly. The SCC relay VPCE cannot be reused for frontโ€‘end traffic.

 
Hope these hints/tips are helpful.
Cheers, Louis.

 

margarita_shir
New Contributor II

Hi everyone,

I have a question about the IAM role for workspace root storage when deploying Databricks on AWS with custom configurations (customer-managed VPC, storage configurations, credential configurations, etc.).

At an earlier stage of our deployment, I was following the manual setup documentation here:

https://docs.databricks.com/aws/en/admin/workspace/create-uc-workspace

Specifically this step:

https://docs.databricks.com/aws/en/admin/workspace/create-uc-workspace#create-a-storage-configuratio...

This section describes creating a storage configuration for the workspace root S3 bucket and includes creating an IAM role that Databricks assumes to access this bucket.

However, when managing the same setup via Terraform, the equivalent resource: databricks_mws_storage_configurations (as documented here:

https://registry.terraform.io/providers/databricks/databricks/latest/docs/guides/aws-workspace#root-...) does not support specifying an IAM role at all, and the Terraform documentation fully omits creating or attaching a role for the root bucket.

This raised a few questions for me:

Was the IAM role originally intended for Unity Catalog storage within the root bucket, but has since been deprecated in favor of separate storage?

Initially, I thought it might be a good idea to explicitly specify an S3 bucket path in the metastore resource (so-called metastore-level storage), but after reading more documentation, I realized that Databricks best practices recommend assigning storage at the catalog level (this one is managed by the use of external locations and storage credentials) and this is a separate S3 bucket separate from the root S3 bucket that is used for storing workspace assets (such as data, libraries, and logs). Hence we create the managed catalogs by specifying an external location resource, and Databricks automatically auto-generates the subpath (e.g., s3://databricks-unitycatalog/cps_business_insights/__unitystorage/catalogs/1234fda622-2cfb-478f-bbc4-b9cb84242baf).

Is the modern best practice to use: Root S3 bucket (accessed via bucket policy only) โ†’ stores workspace assets (notebooks, cluster logs, libraries), Separate Unity Catalog metastore bucket (with its own IAM role)

Can anyone clarify if this understanding is correct from a security best practices perspective?

Thanks in advance!