cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Databricks AWS deployment with custom configurations (workspace root storage)

margarita_shir
New Contributor II

Hi everyone,

I have a question about the IAM role for workspace root storage when deploying Databricks on AWS with custom configurations (customer-managed VPC, storage configurations, credential configurations, etc.).

At an earlier stage of our deployment, I was following the manual setup documentation here:

https://docs.databricks.com/aws/en/admin/workspace/create-uc-workspace

Specifically this step:

https://docs.databricks.com/aws/en/admin/workspace/create-uc-workspace#create-a-storage-configuratio...

This section describes creating a storage configuration for the workspace root S3 bucket and includes creating an IAM role that Databricks assumes to access this bucket.

However, when managing the same setup via Terraform, the equivalent resource: databricks_mws_storage_configurations (as documented here:

https://registry.terraform.io/providers/databricks/databricks/latest/docs/guides/aws-workspace#root-...) does not support specifying an IAM role at all, and the Terraform documentation fully omits creating or attaching a role for the root bucket.

This raised a few questions for me:

Was the IAM role originally intended for Unity Catalog storage within the root bucket, but has since been deprecated in favor of separate storage?

Initially, I thought it might be a good idea to explicitly specify an S3 bucket path in the metastore resource (so-called metastore-level storage), but after reading more documentation, I realized that Databricks best practices recommend assigning storage at the catalog level (this one is managed by the use of external locations and storage credentials) and this is a separate S3 bucket separate from the root S3 bucket that is used for storing workspace assets (such as data, libraries, and logs). Hence we create the managed catalogs by specifying an external location resource, and Databricks automatically auto-generates the subpath (e.g., s3://databricks-unitycatalog/cps_business_insights/__unitystorage/catalogs/1234fda622-2cfb-478f-bbc4-b9cb84242baf).

Is the modern best practice to use: Root S3 bucket (accessed via bucket policy only) โ†’ stores workspace assets (notebooks, cluster logs, libraries), Separate Unity Catalog metastore bucket (with its own IAM role)

Can anyone clarify if this understanding is correct from a security best practices perspective?

Thanks in advance!

0 REPLIES 0