11-13-2025 07:37 PM
11-14-2025 08:01 PM
Hello @margarita_shir
Short answer: Yes—if your clients can privately reach the existing Databricks “Workspace (including REST API)” interface endpoint, you can reuse that same VPC endpoint for front‑end (user) access. You must not try to use the secure cluster connectivity (SCC) relay endpoint for users. The SCC relay is only for compute-to-control‑plane on port 6666; the “Workspace (including REST API)” service is the one that serves both the web UI and REST APIs for both front‑end and back‑end scenarios.
The Databricks PrivateLink endpoint service named “Workspace (including REST API)” is used for both front‑end user access and back‑end REST from compute, so the same service behind your existing VPCE is valid for browsers, CLI, JDBC/ODBC, and tooling over HTTPS. You just need private reachability and the right DNS and Databricks settings. Do not use the SCC relay service for front‑end. It’s a different service and port (6666).
A “transit VPC” is the common pattern for front‑end, but it’s not a hard requirement. Front‑end PrivateLink endpoint traffic simply needs a private path from clients to the VPCE; your Tailscale subnet router in the workspace VPC satisfies that reachability requirement as long as it routes/advertises the VPCE’s private IPs to clients.
Private Access Settings (PAS): Add the existing Workspace (REST) VPCE registration to the workspace’s PAS and set the access level so the workspace will accept front‑end connections from that endpoint (Endpoint or Account as appropriate). This is what authorizes your front‑end traffic through that VPCE.
Internal DNS: Make your workspace URL resolve to the private IP of that same Workspace (REST) VPCE for your Tailscale clients. In practice, configure your internal DNS so the workspace hostname maps to the VPCE’s private IP; Databricks provides regional privatelink hostnames you can map for this purpose. This is the critical step that steers browser/API traffic privately to the endpoint instead of the public internet.
IdP redirect (only if using SSO): Add the Databricks “PrivateLink Redirect URI” to your identity provider so browser-based SSO completes over the private path. Keep the original (public) redirect URL if you also have non‑PrivateLink workspaces.
Security groups on the VPCE: Ensure the VPCE’s security group allows inbound/outbound HTTPS (443) from your Tailscale-advertised address space, while still allowing any ports your compute needs for back‑end REST (for example, 8443 for internal control-plane API calls). Databricks recommends separate security groups per endpoint following least privilege, but it’s not required; you can widen the existing SG if that’s simpler.
Registration state: If you originally registered the VPCE only in the “network configuration” for back‑end, you can also reference the same VPCE registration in PAS for front‑end authorization; registrations are generic. You don’t need to create a second, separate VPCE solely for front‑end if you can reach the existing one.
DNS test: From a Tailscale client, resolve your workspace hostname and confirm it returns the VPCE private IP you expect (for your region’s privatelink control-plane domain).
Connectivity test: From a Tailscale client, browse to the workspace URL or curl the REST root over HTTPS and verify you reach the UI/API privately; if using SSO, confirm the IdP roundtrip succeeds with the PrivateLink Redirect URI.
In summary: Reusing your existing “Workspace (including REST API)” VPCE for front‑end is supported and can work well with your Tailscale-based reachability, provided you update PAS, DNS, IdP (if applicable), and security group rules accordingly. The SCC relay VPCE cannot be reused for front‑end traffic.
11-14-2025 08:01 PM
Hello @margarita_shir
Short answer: Yes—if your clients can privately reach the existing Databricks “Workspace (including REST API)” interface endpoint, you can reuse that same VPC endpoint for front‑end (user) access. You must not try to use the secure cluster connectivity (SCC) relay endpoint for users. The SCC relay is only for compute-to-control‑plane on port 6666; the “Workspace (including REST API)” service is the one that serves both the web UI and REST APIs for both front‑end and back‑end scenarios.
The Databricks PrivateLink endpoint service named “Workspace (including REST API)” is used for both front‑end user access and back‑end REST from compute, so the same service behind your existing VPCE is valid for browsers, CLI, JDBC/ODBC, and tooling over HTTPS. You just need private reachability and the right DNS and Databricks settings. Do not use the SCC relay service for front‑end. It’s a different service and port (6666).
A “transit VPC” is the common pattern for front‑end, but it’s not a hard requirement. Front‑end PrivateLink endpoint traffic simply needs a private path from clients to the VPCE; your Tailscale subnet router in the workspace VPC satisfies that reachability requirement as long as it routes/advertises the VPCE’s private IPs to clients.
Private Access Settings (PAS): Add the existing Workspace (REST) VPCE registration to the workspace’s PAS and set the access level so the workspace will accept front‑end connections from that endpoint (Endpoint or Account as appropriate). This is what authorizes your front‑end traffic through that VPCE.
Internal DNS: Make your workspace URL resolve to the private IP of that same Workspace (REST) VPCE for your Tailscale clients. In practice, configure your internal DNS so the workspace hostname maps to the VPCE’s private IP; Databricks provides regional privatelink hostnames you can map for this purpose. This is the critical step that steers browser/API traffic privately to the endpoint instead of the public internet.
IdP redirect (only if using SSO): Add the Databricks “PrivateLink Redirect URI” to your identity provider so browser-based SSO completes over the private path. Keep the original (public) redirect URL if you also have non‑PrivateLink workspaces.
Security groups on the VPCE: Ensure the VPCE’s security group allows inbound/outbound HTTPS (443) from your Tailscale-advertised address space, while still allowing any ports your compute needs for back‑end REST (for example, 8443 for internal control-plane API calls). Databricks recommends separate security groups per endpoint following least privilege, but it’s not required; you can widen the existing SG if that’s simpler.
Registration state: If you originally registered the VPCE only in the “network configuration” for back‑end, you can also reference the same VPCE registration in PAS for front‑end authorization; registrations are generic. You don’t need to create a second, separate VPCE solely for front‑end if you can reach the existing one.
DNS test: From a Tailscale client, resolve your workspace hostname and confirm it returns the VPCE private IP you expect (for your region’s privatelink control-plane domain).
Connectivity test: From a Tailscale client, browse to the workspace URL or curl the REST root over HTTPS and verify you reach the UI/API privately; if using SSO, confirm the IdP roundtrip succeeds with the PrivateLink Redirect URI.
In summary: Reusing your existing “Workspace (including REST API)” VPCE for front‑end is supported and can work well with your Tailscale-based reachability, provided you update PAS, DNS, IdP (if applicable), and security group rules accordingly. The SCC relay VPCE cannot be reused for front‑end traffic.
3 weeks ago
Hi everyone,
I have a question about the IAM role for workspace root storage when deploying Databricks on AWS with custom configurations (customer-managed VPC, storage configurations, credential configurations, etc.).
At an earlier stage of our deployment, I was following the manual setup documentation here:
https://docs.databricks.com/aws/en/admin/workspace/create-uc-workspace
Specifically this step:
This section describes creating a storage configuration for the workspace root S3 bucket and includes creating an IAM role that Databricks assumes to access this bucket.
However, when managing the same setup via Terraform, the equivalent resource: databricks_mws_storage_configurations (as documented here:
https://registry.terraform.io/providers/databricks/databricks/latest/docs/guides/aws-workspace#root-...) does not support specifying an IAM role at all, and the Terraform documentation fully omits creating or attaching a role for the root bucket.
This raised a few questions for me:
Was the IAM role originally intended for Unity Catalog storage within the root bucket, but has since been deprecated in favor of separate storage?
Initially, I thought it might be a good idea to explicitly specify an S3 bucket path in the metastore resource (so-called metastore-level storage), but after reading more documentation, I realized that Databricks best practices recommend assigning storage at the catalog level (this one is managed by the use of external locations and storage credentials) and this is a separate S3 bucket separate from the root S3 bucket that is used for storing workspace assets (such as data, libraries, and logs). Hence we create the managed catalogs by specifying an external location resource, and Databricks automatically auto-generates the subpath (e.g., s3://databricks-unitycatalog/cps_business_insights/__unitystorage/catalogs/1234fda622-2cfb-478f-bbc4-b9cb84242baf).
Is the modern best practice to use: Root S3 bucket (accessed via bucket policy only) → stores workspace assets (notebooks, cluster logs, libraries), Separate Unity Catalog metastore bucket (with its own IAM role)
Can anyone clarify if this understanding is correct from a security best practices perspective?
Thanks in advance!