cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Pros and cons of putting all various Databricks workspaces (dev, qa , prod) under one metastore

F_Goudarzi
New Contributor III

Hi there, 

If we have separate workspaces for each and every environment, then how we should go about structuring the metastore? What are the pros and cons of putting all workspaces under one metastore instead of having separate metastore for each?

Thanks, Fatima

1 REPLY 1

Walter_C
Databricks Employee
Databricks Employee

Hello Fatima, many thanks for your question. Please first note that if all the workspaces belong to the same account id and are on the same cloud region, they will all need to be associated with the same metastore as you can only have 1 metastore per region.

If you require separate metastores you will require your workspaces to be created on different regions.

To be more specific around Pros and cons:

Pros of Using a Single Metastore for Multiple Workspaces:

  1. Centralized Data Governance: A single metastore allows for centralized management of data assets, making it easier to enforce consistent governance policies across all environments.
  2. Simplified Data Sharing: Data sharing between different environments (e.g., development, staging, production) is more straightforward since all data resides within the same metastore.
  3. Reduced Administrative Overhead: Managing one metastore reduces the complexity and administrative overhead compared to managing multiple metastores.
  4. Consistent Access Control: Access control policies can be uniformly applied across all workspaces, ensuring consistent security and compliance.

Cons of Using a Single Metastore for Multiple Workspaces:

  1. Risk of Data Contamination: If not properly managed, there is a risk of data from different environments (e.g., development data) being accessed or modified inappropriately, potentially leading to data contamination.
  2. Performance Impact: A single metastore handling multiple workspaces might experience performance bottlenecks, especially if the workloads are heavy and concurrent.

Pros of Using Separate Metastores for Each Workspace:

  1. Isolation of Environments: Each environment (development, staging, production) is completely isolated, reducing the risk of data contamination and ensuring that changes in one environment do not affect others.
  2. Simplified Access Control: Access control is simpler to manage as each environment has its own set of permissions and policies.
  3. Improved Performance: Separate metastores can lead to better performance as each metastore handles a smaller, more focused set of workloads.

Cons of Using Separate Metastores for Each Workspace:

  1. Increased Administrative Overhead: Managing multiple metastores increases administrative complexity and overhead.
  2. Data Sharing Complexity: Sharing data between different environments becomes more complex and may require additional mechanisms like Delta Sharing.
  3. Inconsistent Governance: Ensuring consistent governance policies across multiple metastores can be challenging.

    If using a single metastore you can set up permissions over the catalogs, so you as a Metastore admins can control which catalogs will be visible in each workspace and who can run changes on them.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group