cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Resource organization in a large company

cgrass
New Contributor

Hello,
We are using Azure Databricks in a single tenant. We will have many teams working in multiple (Unity Enabled) Workspaces using a variety of Catalogs, External Locations, Storage Credentials, ect. Some of those resources will be shared (e.g., an External Location for a common Storage Account), and some will be specific to a team or Workspace. Our parent company controls the Admin Account, so most work will need to be done in the context of a Workspace (via Workspace Admin permissions).

While setting up our infrastructure (managed by Terraform) I realized I might have the wrong mental model. This diagaram shows that Storage credential, External location, and Catalog are not directly associated with a Workspace. However, they must be created and are initially "connected" to a Workspace. It seems you can immediately "disconnect" a Catalog from a Workspace after creation, but this feels a little awkward.

What is considered best practice for organizing and maintaing these resources in a company with many bsuiness units via Terraform?

  1. Is it common to have an "AdministrativeWorkspace" that is used to create resources and manage shared resources?
  2. What granularity should Catalogs be defined at? Prod/Stage/Dev likely won't work for us because we have business units with entirely different security requirements. Have other teams had success with something like `{business_unit}_prod`, `{business_unit}_dev`, or does each team within a business unit have its own collection of catalogs?
    1. If the former, users, schemas, and maybe even table definitions will have to be managed in a single location (an IaC live repo in our case).
    2. If the latter, how do you manage resource collisions or shared resources? E.g., I noticed that two External Locations cannot be defined within two workspaces if they use the same path.

I suspect all of this has been asked and discussed before, but my googlin skills failed me today. I would be happy to read a 10,000 word blog if anyone can point me in the right direction.

Thanks!

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group