<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: aws databricks deployment with custom configurations in Administration &amp; Architecture</title>
    <link>https://community.databricks.com/t5/administration-architecture/aws-databricks-with-frontend-private-link/m-p/142380#M4669</link>
    <description>&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;I have a question about the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;IAM role for workspace root storage&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;when deploying Databricks on AWS with custom configurations (customer-managed VPC, storage configurations, credential configurations, etc.).&lt;/P&gt;&lt;P&gt;At an earlier stage of our deployment, I was following the manual setup documentation here:&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/aws/en/admin/workspace/create-uc-workspace" target="_blank"&gt;https://docs.databricks.com/aws/en/admin/workspace/create-uc-workspace&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Specifically this step:&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/aws/en/admin/workspace/create-uc-workspace#create-a-storage-configuration" target="_blank"&gt;https://docs.databricks.com/aws/en/admin/workspace/create-uc-workspace#create-a-storage-configuration&lt;/A&gt;&lt;/P&gt;&lt;P&gt;This section describes creating a storage configuration for the workspace root S3 bucket and includes creating an IAM role that Databricks assumes to access this bucket.&lt;/P&gt;&lt;P&gt;However, when managing the same setup via Terraform, the equivalent resource: databricks_mws_storage_configurations (as documented here:&lt;/P&gt;&lt;P&gt;&lt;A href="https://registry.terraform.io/providers/databricks/databricks/latest/docs/guides/aws-workspace#root-bucket" target="_blank"&gt;https://registry.terraform.io/providers/databricks/databricks/latest/docs/guides/aws-workspace#root-bucket&lt;/A&gt;) does not support specifying an IAM role at all, and the Terraform documentation fully omits creating or attaching a role for the root bucket.&lt;/P&gt;&lt;P&gt;This raised a few questions for me:&lt;/P&gt;&lt;P&gt;Was the IAM role originally intended for Unity Catalog storage within the root bucket, but has since been deprecated in favor of separate storage?&lt;/P&gt;&lt;P&gt;Initially, I thought it might be a good idea to explicitly specify an S3 bucket path in the metastore resource (so-called metastore-level storage), but after reading more documentation, I realized that Databricks best practices recommend assigning storage at the catalog level (this one is managed by the use of external locations and storage credentials) and this is a separate S3 bucket separate from the root S3 bucket that is used for storing workspace assets (such as data, libraries, and logs). Hence we create the managed catalogs by specifying an external location resource, and Databricks automatically auto-generates the subpath (e.g., s3://databricks-unitycatalog/cps_business_insights/__unitystorage/catalogs/1234fda622-2cfb-478f-bbc4-b9cb84242baf).&lt;/P&gt;&lt;P&gt;Is the modern best practice to use: Root S3 bucket (accessed via bucket policy only) → stores workspace assets (notebooks, cluster logs, libraries), Separate Unity Catalog metastore bucket (with its own IAM role)&lt;/P&gt;&lt;P&gt;Can anyone clarify if this understanding is correct from a security best practices perspective?&lt;/P&gt;&lt;P&gt;Thanks in advance!&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 22 Dec 2025 21:09:57 GMT</pubDate>
    <dc:creator>margarita_shir</dc:creator>
    <dc:date>2025-12-22T21:09:57Z</dc:date>
    <item>
      <title>aws databricks with frontend private link</title>
      <link>https://community.databricks.com/t5/administration-architecture/aws-databricks-with-frontend-private-link/m-p/138991#M4465</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;In aws databricks documentation, frontend PrivateLink assumes a separate transit VPC connected via Direct Connect/VPN. However, I'm implementing a different architecture using Tailscale for private network access.&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;My setup: Tailscale subnet router deployed directly within the same VPC as the Databricks workspace (no separate transit VPC) Subnet router advertises the entire VPC CIDR, making all workspace resources accessible to Tailscale clients. Existing backend workspace VPC endpoint already configured for cluster-to-control-plane REST API communication&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;My question: since my Tailscale subnet router can directly reach the backend endpoint's private IP within the same VPC, could I theoretically reuse this existing workspace endpoint for frontend user access as well instead of creating a separate frontend endpoint?&lt;PRE&gt;&amp;nbsp;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Fri, 14 Nov 2025 03:37:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/aws-databricks-with-frontend-private-link/m-p/138991#M4465</guid>
      <dc:creator>margarita_shir</dc:creator>
      <dc:date>2025-11-14T03:37:23Z</dc:date>
    </item>
    <item>
      <title>Re: aws databricks with frontend private link</title>
      <link>https://community.databricks.com/t5/administration-architecture/aws-databricks-with-frontend-private-link/m-p/139151#M4471</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/197721"&gt;@margarita_shir&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class="qt3gz91 paragraph"&gt;Short answer: Yes—if your clients can privately reach the existing Databricks “Workspace (including REST API)” interface endpoint, you can reuse that same VPC endpoint for front‑end (user) access. You must not try to use the secure cluster connectivity (SCC) relay endpoint for users. The SCC relay is only for compute-to-control‑plane on port 6666; the “Workspace (including REST API)” service is the one that serves both the web UI and REST APIs for both front‑end and back‑end scenarios.&lt;/P&gt;
&lt;H3 class="_7uu25p0 qt3gz9c _7pq7t612 heading3 _7uu25p1"&gt;Why this works&lt;/H3&gt;
&lt;UL class="qt3gz97 qt3gz92"&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;The Databricks PrivateLink endpoint service named “Workspace (including REST API)” is used for both front‑end user access and back‑end REST from compute, so the same service behind your existing VPCE is valid for browsers, CLI, JDBC/ODBC, and tooling over HTTPS. You just need private reachability and the right DNS and Databricks settings. Do not use the SCC relay service for front‑end. It’s a different service and port (6666).&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;A “transit VPC” is the common pattern for front‑end, but it’s not a hard requirement. Front‑end PrivateLink endpoint traffic simply needs a private path from clients to the VPCE; your Tailscale subnet router in the workspace VPC satisfies that reachability requirement as long as it routes/advertises the VPCE’s private IPs to clients.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3 class="_7uu25p0 qt3gz9c _7pq7t612 heading3 _7uu25p1"&gt;What you need to change to make it work&lt;/H3&gt;
&lt;UL class="qt3gz97 qt3gz92"&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;Private Access Settings (PAS): Add the existing Workspace (REST) VPCE registration to the workspace’s PAS and set the access level so the workspace will accept front‑end connections from that endpoint (Endpoint or Account as appropriate). This is what authorizes your front‑end traffic through that VPCE.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;Internal DNS: Make your workspace URL resolve to the private IP of that same Workspace (REST) VPCE for your Tailscale clients. In practice, configure your internal DNS so the workspace hostname maps to the VPCE’s private IP; Databricks provides regional privatelink hostnames you can map for this purpose. This is the critical step that steers browser/API traffic privately to the endpoint instead of the public internet.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;IdP redirect (only if using SSO): Add the Databricks “PrivateLink Redirect URI” to your identity provider so browser-based SSO completes over the private path. Keep the original (public) redirect URL if you also have non‑PrivateLink workspaces.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;Security groups on the VPCE: Ensure the VPCE’s security group allows inbound/outbound HTTPS (443) from your Tailscale-advertised address space, while still allowing any ports your compute needs for back‑end REST (for example, 8443 for internal control-plane API calls). Databricks recommends separate security groups per endpoint following least privilege, but it’s not required; you can widen the existing SG if that’s simpler.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;Registration state: If you originally registered the VPCE only in the “network configuration” for back‑end, you can also reference the same VPCE registration in PAS for front‑end authorization; registrations are generic. You don’t need to create a second, separate VPCE solely for front‑end if you can reach the existing one.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3 class="_7uu25p0 qt3gz9c _7pq7t612 heading3 _7uu25p1"&gt;Things not to do&lt;/H3&gt;
&lt;UL class="qt3gz97 qt3gz92"&gt;
&lt;LI class="qt3gz9a"&gt;Don’t try to point users at the SCC relay endpoint; it’s for the compute tunnel only (TCP 6666) and won’t serve the web UI or REST over HTTPS.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3 class="_7uu25p0 qt3gz9c _7pq7t612 heading3 _7uu25p1"&gt;Validation tips&lt;/H3&gt;
&lt;UL class="qt3gz97 qt3gz92"&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;DNS test: From a Tailscale client, resolve your workspace hostname and confirm it returns the VPCE private IP you expect (for your region’s privatelink control-plane domain).&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="qt3gz9a"&gt;
&lt;P class="qt3gz91 paragraph"&gt;Connectivity test: From a Tailscale client, browse to the workspace URL or curl the REST root over HTTPS and verify you reach the UI/API privately; if using SSO, confirm the IdP roundtrip succeeds with the PrivateLink Redirect URI.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3 class="_7uu25p0 qt3gz9c _7pq7t612 heading3 _7uu25p1"&gt;When you might still choose a separate front‑end VPCE&lt;/H3&gt;
&lt;UL class="qt3gz97 qt3gz92"&gt;
&lt;LI class="qt3gz9a"&gt;Operational isolation: Some teams maintain a distinct front‑end VPCE (often in a “shared services/transit” VPC) so they can manage different security groups, route tables, and DNS boundaries for user/browser traffic versus compute traffic. This is a best‑practice pattern but not strictly required for functionality.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="qt3gz91 paragraph"&gt;In summary: Reusing your existing “Workspace (including REST API)” VPCE for front‑end is supported and can work well with your Tailscale-based reachability, provided you update PAS, DNS, IdP (if applicable), and security group rules accordingly. The SCC relay VPCE cannot be reused for front‑end traffic.&lt;/P&gt;
&lt;DIV class="_7pq7t614 _7pq7t6cl wrz27r2 wrz27r0"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="_7pq7t614 _7pq7t6cl wrz27r2 wrz27r0"&gt;Hope these hints/tips are helpful.&lt;/DIV&gt;
&lt;DIV class="_7pq7t614 _7pq7t6cl wrz27r2 wrz27r0"&gt;Cheers, Louis.&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 15 Nov 2025 04:01:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/aws-databricks-with-frontend-private-link/m-p/139151#M4471</guid>
      <dc:creator>Louis_Frolio</dc:creator>
      <dc:date>2025-11-15T04:01:02Z</dc:date>
    </item>
    <item>
      <title>Re: aws databricks deployment with custom configurations</title>
      <link>https://community.databricks.com/t5/administration-architecture/aws-databricks-with-frontend-private-link/m-p/142380#M4669</link>
      <description>&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;I have a question about the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;IAM role for workspace root storage&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;when deploying Databricks on AWS with custom configurations (customer-managed VPC, storage configurations, credential configurations, etc.).&lt;/P&gt;&lt;P&gt;At an earlier stage of our deployment, I was following the manual setup documentation here:&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/aws/en/admin/workspace/create-uc-workspace" target="_blank"&gt;https://docs.databricks.com/aws/en/admin/workspace/create-uc-workspace&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Specifically this step:&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/aws/en/admin/workspace/create-uc-workspace#create-a-storage-configuration" target="_blank"&gt;https://docs.databricks.com/aws/en/admin/workspace/create-uc-workspace#create-a-storage-configuration&lt;/A&gt;&lt;/P&gt;&lt;P&gt;This section describes creating a storage configuration for the workspace root S3 bucket and includes creating an IAM role that Databricks assumes to access this bucket.&lt;/P&gt;&lt;P&gt;However, when managing the same setup via Terraform, the equivalent resource: databricks_mws_storage_configurations (as documented here:&lt;/P&gt;&lt;P&gt;&lt;A href="https://registry.terraform.io/providers/databricks/databricks/latest/docs/guides/aws-workspace#root-bucket" target="_blank"&gt;https://registry.terraform.io/providers/databricks/databricks/latest/docs/guides/aws-workspace#root-bucket&lt;/A&gt;) does not support specifying an IAM role at all, and the Terraform documentation fully omits creating or attaching a role for the root bucket.&lt;/P&gt;&lt;P&gt;This raised a few questions for me:&lt;/P&gt;&lt;P&gt;Was the IAM role originally intended for Unity Catalog storage within the root bucket, but has since been deprecated in favor of separate storage?&lt;/P&gt;&lt;P&gt;Initially, I thought it might be a good idea to explicitly specify an S3 bucket path in the metastore resource (so-called metastore-level storage), but after reading more documentation, I realized that Databricks best practices recommend assigning storage at the catalog level (this one is managed by the use of external locations and storage credentials) and this is a separate S3 bucket separate from the root S3 bucket that is used for storing workspace assets (such as data, libraries, and logs). Hence we create the managed catalogs by specifying an external location resource, and Databricks automatically auto-generates the subpath (e.g., s3://databricks-unitycatalog/cps_business_insights/__unitystorage/catalogs/1234fda622-2cfb-478f-bbc4-b9cb84242baf).&lt;/P&gt;&lt;P&gt;Is the modern best practice to use: Root S3 bucket (accessed via bucket policy only) → stores workspace assets (notebooks, cluster logs, libraries), Separate Unity Catalog metastore bucket (with its own IAM role)&lt;/P&gt;&lt;P&gt;Can anyone clarify if this understanding is correct from a security best practices perspective?&lt;/P&gt;&lt;P&gt;Thanks in advance!&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 22 Dec 2025 21:09:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/aws-databricks-with-frontend-private-link/m-p/142380#M4669</guid>
      <dc:creator>margarita_shir</dc:creator>
      <dc:date>2025-12-22T21:09:57Z</dc:date>
    </item>
  </channel>
</rss>

