cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Best practices for structuring databricks workspaces for CI/CD and ML workflows

JoaoPigozzo
New Contributor III

Hi everyone,

I’m designing the CI/CD process for our environment environment focused on machine learning and data science projects, and I’d like to understand what the best practices are regarding workspace organization—especially when using Unity Catalog.

I’m currently considering a few possible approaches:

  • A single workspace for all ML and data projects.

  • Separate workspace(s) per project.

  • Separate workspace(s) per business area.

  • Or a hierarchical model, where each project has its own workspace for development, and once it’s ready, it’s promoted to the workspace of the corresponding business area.

What are the recommended practices or patterns the community has found effective for managing multiple workspace(s) in this type of setup?

Thanks in advance for any insights!

1 ACCEPTED SOLUTION

Accepted Solutions

mark_ott
Databricks Employee
Databricks Employee

When designing a CI/CD process for Databricks environments — especially for machine learning and data science projects using Unity Catalog — enterprise-scale workspace organization should balance isolation, governance, and collaboration. The recommended practice is to minimize the number of workspaces while using Unity Catalog’s governance features for secure, logical isolation within shared workspaces rather than always relying on full workspace separation.


Recommended Workspace Organization Patterns

1. Single Shared Workspaces with Unity Catalog

For most mid-to-large organizations, one shared set of workspaces (DEV, STG, PROD)—each linked to a single Unity Catalog metastore—offers simplicity and centralized governance.​
Unity Catalog allows team- or project-level segregation through catalogs and schemas without creating redundant workspaces.

Typical structure:

  • Catalog per environment scope or business domain (e.g., sales_dev, sales_prod)

  • Schema per project or team inside the catalog (e.g., sales.dev_team1, finance.analytics)

This setup takes advantage of hierarchical privilege inheritance, reducing administrative overhead while supporting environment promotion via CI/CD pipelines.

Best suited for:

  • Organizations standardizing development, staging, and production lifecycles.

  • Teams needing shared governance and collaboration without strict isolation demands.


2. Line-of-Business (LOB) Isolation

Larger enterprises often align workspace organization with business units (LOBs)—each LOB having its own Dev-Stg-Prod workspaces. All workspaces share a single federated Unity Catalog metastore for consistent policy enforcement and data sharing.​

Advantages:

  • Clear data and user isolation per business area.

  • Easier cost attribution and admin responsibility separation.

  • Supports independent CI/CD automation across LOBs.

Drawbacks include greater operational complexity and the need for a centralized Center of Excellence (COE) to standardize pipelines and permissions.


3. Project or Data Product Isolation

For organizations focused on individual ML or analytics products, workspaces can be isolated per data or ML product rather than by department. This approach provides flexibility when cross-functional collaboration is needed or when projects vary in sensitivity and scope.​

Typical structure:

  • Shared dev workspace for experimentation.

  • Dedicated production workspace for each flagship model or product.

  • Shared Unity Catalog to ensure data versioning and lineage across products.


4. Hybrid (Hierarchical) Model

A common enterprise compromise integrates the above models:

  • Development workspaces per project or team, promoting validated assets to a shared business-area workspace for production management.

  • A single regional Unity Catalog metastore spans all workspaces for centralized lineage and auditing.​

This model supports flexibility and team autonomy while retaining centralized security and compliance through Unity Catalog.


Unity Catalog Best Practices for Multi-Workspace CI/CD

From Databricks Unity Catalog governance recommendations :​

  • Use one metastore per region and link all workspaces in that region to it.

  • Organize catalogs by environment, team, or business unit—catalogs become the main unit of isolation, not separate metastores.

  • Grant privileges through groups from your identity provider (IdP), not manually per workspace.

  • Reserve MODIFY access for service principals in production; use CI/CD to promote code and models.

  • Use managed tables and volumes for most assets, limiting external tables to legacy or cross-platform integrations.

  • Bind external locations or catalogs to specific workspaces only when isolation requirements demand it.


Practical Guidance for ML-Focused CI/CD

For machine learning environments with CI/CD:

  • Manage reproducibility and data versioning using Delta Lake + MLflow under Unity Catalog.​

  • Use Databricks Asset Bundles (YAML) to define environment-specific deployment configurations.

  • Integrate CI/CD pipelines with model registry promotion across Unity Catalog–linked workspaces.

  • Adopt Terraform or IaC templates to automate workspace, catalog, and cluster policies.


Summary Recommendation

Organization Strategy Recommended For Key Unity Catalog Design
Single shared set (Dev/Stg/Prod) Small to medium organizations Single metastore, catalogs per business area
LOB-based Large enterprises with strict isolation Multiple workspaces per LOB, one metastore
Project/Data Product-based Cross-functional ML teams Shared dev/stg workspace, isolated prod per project
Hybrid hierarchical Enterprises with both governance and flexibility needs Dev workspaces per project, centralized prod catalog
 
 

In 2025, the community and Databricks’ own guidance converge on this principle: use as few workspaces as practical and rely on Unity Catalog for logical data and permission isolation, reserving new workspaces only for strong security, compliance, or regional boundaries.​

View solution in original post

2 REPLIES 2

mark_ott
Databricks Employee
Databricks Employee

When designing a CI/CD process for Databricks environments — especially for machine learning and data science projects using Unity Catalog — enterprise-scale workspace organization should balance isolation, governance, and collaboration. The recommended practice is to minimize the number of workspaces while using Unity Catalog’s governance features for secure, logical isolation within shared workspaces rather than always relying on full workspace separation.


Recommended Workspace Organization Patterns

1. Single Shared Workspaces with Unity Catalog

For most mid-to-large organizations, one shared set of workspaces (DEV, STG, PROD)—each linked to a single Unity Catalog metastore—offers simplicity and centralized governance.​
Unity Catalog allows team- or project-level segregation through catalogs and schemas without creating redundant workspaces.

Typical structure:

  • Catalog per environment scope or business domain (e.g., sales_dev, sales_prod)

  • Schema per project or team inside the catalog (e.g., sales.dev_team1, finance.analytics)

This setup takes advantage of hierarchical privilege inheritance, reducing administrative overhead while supporting environment promotion via CI/CD pipelines.

Best suited for:

  • Organizations standardizing development, staging, and production lifecycles.

  • Teams needing shared governance and collaboration without strict isolation demands.


2. Line-of-Business (LOB) Isolation

Larger enterprises often align workspace organization with business units (LOBs)—each LOB having its own Dev-Stg-Prod workspaces. All workspaces share a single federated Unity Catalog metastore for consistent policy enforcement and data sharing.​

Advantages:

  • Clear data and user isolation per business area.

  • Easier cost attribution and admin responsibility separation.

  • Supports independent CI/CD automation across LOBs.

Drawbacks include greater operational complexity and the need for a centralized Center of Excellence (COE) to standardize pipelines and permissions.


3. Project or Data Product Isolation

For organizations focused on individual ML or analytics products, workspaces can be isolated per data or ML product rather than by department. This approach provides flexibility when cross-functional collaboration is needed or when projects vary in sensitivity and scope.​

Typical structure:

  • Shared dev workspace for experimentation.

  • Dedicated production workspace for each flagship model or product.

  • Shared Unity Catalog to ensure data versioning and lineage across products.


4. Hybrid (Hierarchical) Model

A common enterprise compromise integrates the above models:

  • Development workspaces per project or team, promoting validated assets to a shared business-area workspace for production management.

  • A single regional Unity Catalog metastore spans all workspaces for centralized lineage and auditing.​

This model supports flexibility and team autonomy while retaining centralized security and compliance through Unity Catalog.


Unity Catalog Best Practices for Multi-Workspace CI/CD

From Databricks Unity Catalog governance recommendations :​

  • Use one metastore per region and link all workspaces in that region to it.

  • Organize catalogs by environment, team, or business unit—catalogs become the main unit of isolation, not separate metastores.

  • Grant privileges through groups from your identity provider (IdP), not manually per workspace.

  • Reserve MODIFY access for service principals in production; use CI/CD to promote code and models.

  • Use managed tables and volumes for most assets, limiting external tables to legacy or cross-platform integrations.

  • Bind external locations or catalogs to specific workspaces only when isolation requirements demand it.


Practical Guidance for ML-Focused CI/CD

For machine learning environments with CI/CD:

  • Manage reproducibility and data versioning using Delta Lake + MLflow under Unity Catalog.​

  • Use Databricks Asset Bundles (YAML) to define environment-specific deployment configurations.

  • Integrate CI/CD pipelines with model registry promotion across Unity Catalog–linked workspaces.

  • Adopt Terraform or IaC templates to automate workspace, catalog, and cluster policies.


Summary Recommendation

Organization Strategy Recommended For Key Unity Catalog Design
Single shared set (Dev/Stg/Prod) Small to medium organizations Single metastore, catalogs per business area
LOB-based Large enterprises with strict isolation Multiple workspaces per LOB, one metastore
Project/Data Product-based Cross-functional ML teams Shared dev/stg workspace, isolated prod per project
Hybrid hierarchical Enterprises with both governance and flexibility needs Dev workspaces per project, centralized prod catalog
 
 

In 2025, the community and Databricks’ own guidance converge on this principle: use as few workspaces as practical and rely on Unity Catalog for logical data and permission isolation, reserving new workspaces only for strong security, compliance, or regional boundaries.​

JoaoPigozzo
New Contributor III

Thank you @mark_ott. I tend to believe that having a single shared set of workspace (Development, Staging, PROD) would be the best choice to balance simplicity and governance aligned with our current complexity level. I also believe that FinOps objectives can be achieved through effective tagging policies.