Hello,
We are using Azure Databricks in a single tenant. We will have many teams working in multiple (Unity Enabled) Workspaces using a variety of Catalogs, External Locations, Storage Credentials, ect. Some of those resources will be shared (e.g., an External Location for a common Storage Account), and some will be specific to a team or Workspace. Our parent company controls the Admin Account, so most work will need to be done in the context of a Workspace (via Workspace Admin permissions).
While setting up our infrastructure (managed by Terraform) I realized I might have the wrong mental model. This diagaram shows that Storage credential, External location, and Catalog are not directly associated with a Workspace. However, they must be created and are initially "connected" to a Workspace. It seems you can immediately "disconnect" a Catalog from a Workspace after creation, but this feels a little awkward.
What is considered best practice for organizing and maintaing these resources in a company with many bsuiness units via Terraform?
- Is it common to have an "AdministrativeWorkspace" that is used to create resources and manage shared resources?
- What granularity should Catalogs be defined at? Prod/Stage/Dev likely won't work for us because we have business units with entirely different security requirements. Have other teams had success with something like `{business_unit}_prod`, `{business_unit}_dev`, or does each team within a business unit have its own collection of catalogs?
- If the former, users, schemas, and maybe even table definitions will have to be managed in a single location (an IaC live repo in our case).
- If the latter, how do you manage resource collisions or shared resources? E.g., I noticed that two External Locations cannot be defined within two workspaces if they use the same path.
I suspect all of this has been asked and discussed before, but my googlin skills failed me today. I would be happy to read a 10,000 word blog if anyone can point me in the right direction.
Thanks!