โ11-13-2025 07:37 PM
โ11-14-2025 08:01 PM
Hello @margarita_shir
Short answer: Yesโif your clients can privately reach the existing Databricks โWorkspace (including REST API)โ interface endpoint, you can reuse that same VPC endpoint for frontโend (user) access. You must not try to use the secure cluster connectivity (SCC) relay endpoint for users. The SCC relay is only for compute-to-controlโplane on port 6666; the โWorkspace (including REST API)โ service is the one that serves both the web UI and REST APIs for both frontโend and backโend scenarios.
The Databricks PrivateLink endpoint service named โWorkspace (including REST API)โ is used for both frontโend user access and backโend REST from compute, so the same service behind your existing VPCE is valid for browsers, CLI, JDBC/ODBC, and tooling over HTTPS. You just need private reachability and the right DNS and Databricks settings. Do not use the SCC relay service for frontโend. Itโs a different service and port (6666).
A โtransit VPCโ is the common pattern for frontโend, but itโs not a hard requirement. Frontโend PrivateLink endpoint traffic simply needs a private path from clients to the VPCE; your Tailscale subnet router in the workspace VPC satisfies that reachability requirement as long as it routes/advertises the VPCEโs private IPs to clients.
Private Access Settings (PAS): Add the existing Workspace (REST) VPCE registration to the workspaceโs PAS and set the access level so the workspace will accept frontโend connections from that endpoint (Endpoint or Account as appropriate). This is what authorizes your frontโend traffic through that VPCE.
Internal DNS: Make your workspace URL resolve to the private IP of that same Workspace (REST) VPCE for your Tailscale clients. In practice, configure your internal DNS so the workspace hostname maps to the VPCEโs private IP; Databricks provides regional privatelink hostnames you can map for this purpose. This is the critical step that steers browser/API traffic privately to the endpoint instead of the public internet.
IdP redirect (only if using SSO): Add the Databricks โPrivateLink Redirect URIโ to your identity provider so browser-based SSO completes over the private path. Keep the original (public) redirect URL if you also have nonโPrivateLink workspaces.
Security groups on the VPCE: Ensure the VPCEโs security group allows inbound/outbound HTTPS (443) from your Tailscale-advertised address space, while still allowing any ports your compute needs for backโend REST (for example, 8443 for internal control-plane API calls). Databricks recommends separate security groups per endpoint following least privilege, but itโs not required; you can widen the existing SG if thatโs simpler.
Registration state: If you originally registered the VPCE only in the โnetwork configurationโ for backโend, you can also reference the same VPCE registration in PAS for frontโend authorization; registrations are generic. You donโt need to create a second, separate VPCE solely for frontโend if you can reach the existing one.
DNS test: From a Tailscale client, resolve your workspace hostname and confirm it returns the VPCE private IP you expect (for your regionโs privatelink control-plane domain).
Connectivity test: From a Tailscale client, browse to the workspace URL or curl the REST root over HTTPS and verify you reach the UI/API privately; if using SSO, confirm the IdP roundtrip succeeds with the PrivateLink Redirect URI.
In summary: Reusing your existing โWorkspace (including REST API)โ VPCE for frontโend is supported and can work well with your Tailscale-based reachability, provided you update PAS, DNS, IdP (if applicable), and security group rules accordingly. The SCC relay VPCE cannot be reused for frontโend traffic.
โ11-14-2025 08:01 PM
Hello @margarita_shir
Short answer: Yesโif your clients can privately reach the existing Databricks โWorkspace (including REST API)โ interface endpoint, you can reuse that same VPC endpoint for frontโend (user) access. You must not try to use the secure cluster connectivity (SCC) relay endpoint for users. The SCC relay is only for compute-to-controlโplane on port 6666; the โWorkspace (including REST API)โ service is the one that serves both the web UI and REST APIs for both frontโend and backโend scenarios.
The Databricks PrivateLink endpoint service named โWorkspace (including REST API)โ is used for both frontโend user access and backโend REST from compute, so the same service behind your existing VPCE is valid for browsers, CLI, JDBC/ODBC, and tooling over HTTPS. You just need private reachability and the right DNS and Databricks settings. Do not use the SCC relay service for frontโend. Itโs a different service and port (6666).
A โtransit VPCโ is the common pattern for frontโend, but itโs not a hard requirement. Frontโend PrivateLink endpoint traffic simply needs a private path from clients to the VPCE; your Tailscale subnet router in the workspace VPC satisfies that reachability requirement as long as it routes/advertises the VPCEโs private IPs to clients.
Private Access Settings (PAS): Add the existing Workspace (REST) VPCE registration to the workspaceโs PAS and set the access level so the workspace will accept frontโend connections from that endpoint (Endpoint or Account as appropriate). This is what authorizes your frontโend traffic through that VPCE.
Internal DNS: Make your workspace URL resolve to the private IP of that same Workspace (REST) VPCE for your Tailscale clients. In practice, configure your internal DNS so the workspace hostname maps to the VPCEโs private IP; Databricks provides regional privatelink hostnames you can map for this purpose. This is the critical step that steers browser/API traffic privately to the endpoint instead of the public internet.
IdP redirect (only if using SSO): Add the Databricks โPrivateLink Redirect URIโ to your identity provider so browser-based SSO completes over the private path. Keep the original (public) redirect URL if you also have nonโPrivateLink workspaces.
Security groups on the VPCE: Ensure the VPCEโs security group allows inbound/outbound HTTPS (443) from your Tailscale-advertised address space, while still allowing any ports your compute needs for backโend REST (for example, 8443 for internal control-plane API calls). Databricks recommends separate security groups per endpoint following least privilege, but itโs not required; you can widen the existing SG if thatโs simpler.
Registration state: If you originally registered the VPCE only in the โnetwork configurationโ for backโend, you can also reference the same VPCE registration in PAS for frontโend authorization; registrations are generic. You donโt need to create a second, separate VPCE solely for frontโend if you can reach the existing one.
DNS test: From a Tailscale client, resolve your workspace hostname and confirm it returns the VPCE private IP you expect (for your regionโs privatelink control-plane domain).
Connectivity test: From a Tailscale client, browse to the workspace URL or curl the REST root over HTTPS and verify you reach the UI/API privately; if using SSO, confirm the IdP roundtrip succeeds with the PrivateLink Redirect URI.
In summary: Reusing your existing โWorkspace (including REST API)โ VPCE for frontโend is supported and can work well with your Tailscale-based reachability, provided you update PAS, DNS, IdP (if applicable), and security group rules accordingly. The SCC relay VPCE cannot be reused for frontโend traffic.
3 weeks ago
Hi everyone,
I have a question about the IAM role for workspace root storage when deploying Databricks on AWS with custom configurations (customer-managed VPC, storage configurations, credential configurations, etc.).
At an earlier stage of our deployment, I was following the manual setup documentation here:
https://docs.databricks.com/aws/en/admin/workspace/create-uc-workspace
Specifically this step:
This section describes creating a storage configuration for the workspace root S3 bucket and includes creating an IAM role that Databricks assumes to access this bucket.
However, when managing the same setup via Terraform, the equivalent resource: databricks_mws_storage_configurations (as documented here:
https://registry.terraform.io/providers/databricks/databricks/latest/docs/guides/aws-workspace#root-...) does not support specifying an IAM role at all, and the Terraform documentation fully omits creating or attaching a role for the root bucket.
This raised a few questions for me:
Was the IAM role originally intended for Unity Catalog storage within the root bucket, but has since been deprecated in favor of separate storage?
Initially, I thought it might be a good idea to explicitly specify an S3 bucket path in the metastore resource (so-called metastore-level storage), but after reading more documentation, I realized that Databricks best practices recommend assigning storage at the catalog level (this one is managed by the use of external locations and storage credentials) and this is a separate S3 bucket separate from the root S3 bucket that is used for storing workspace assets (such as data, libraries, and logs). Hence we create the managed catalogs by specifying an external location resource, and Databricks automatically auto-generates the subpath (e.g., s3://databricks-unitycatalog/cps_business_insights/__unitystorage/catalogs/1234fda622-2cfb-478f-bbc4-b9cb84242baf).
Is the modern best practice to use: Root S3 bucket (accessed via bucket policy only) โ stores workspace assets (notebooks, cluster logs, libraries), Separate Unity Catalog metastore bucket (with its own IAM role)
Can anyone clarify if this understanding is correct from a security best practices perspective?
Thanks in advance!