Databricks Community

JoaoPigozzo · ‎10-23-2025

Hi everyone,

I’m designing the CI/CD process for our environment environment focused on machine learning and data science projects, and I’d like to understand what the best practices are regarding workspace organization—especially when using Unity Catalog.

I’m currently considering a few possible approaches:

A single workspace for all ML and data projects.
Separate workspace(s) per project.
Separate workspace(s) per business area.
Or a hierarchical model, where each project has its own workspace for development, and once it’s ready, it’s promoted to the workspace of the corresponding business area.

What are the recommended practices or patterns the community has found effective for managing multiple workspace(s) in this type of setup?

Thanks in advance for any insights!

mark_ott · ‎10-24-2025

When designing a CI/CD process for Databricks environments — especially for machine learning and data science projects using Unity Catalog — enterprise-scale workspace organization should balance isolation, governance, and collaboration. The recommended practice is to minimize the number of workspaces while using Unity Catalog’s governance features for secure, logical isolation within shared workspaces rather than always relying on full workspace separation.

Recommended Workspace Organization Patterns

1. Single Shared Workspaces with Unity Catalog

For most mid-to-large organizations, one shared set of workspaces (DEV, STG, PROD)—each linked to a single Unity Catalog metastore—offers simplicity and centralized governance.
Unity Catalog allows team- or project-level segregation through catalogs and schemas without creating redundant workspaces.

Typical structure:

Catalog per environment scope or business domain (e.g., sales_dev, sales_prod)
Schema per project or team inside the catalog (e.g., sales.dev_team1, finance.analytics)

This setup takes advantage of hierarchical privilege inheritance, reducing administrative overhead while supporting environment promotion via CI/CD pipelines.

Best suited for:

Organizations standardizing development, staging, and production lifecycles.
Teams needing shared governance and collaboration without strict isolation demands.

2. Line-of-Business (LOB) Isolation

Larger enterprises often align workspace organization with business units (LOBs)—each LOB having its own Dev-Stg-Prod workspaces. All workspaces share a single federated Unity Catalog metastore for consistent policy enforcement and data sharing.

Advantages:

Clear data and user isolation per business area.
Easier cost attribution and admin responsibility separation.
Supports independent CI/CD automation across LOBs.

Drawbacks include greater operational complexity and the need for a centralized Center of Excellence (COE) to standardize pipelines and permissions.

3. Project or Data Product Isolation

For organizations focused on individual ML or analytics products, workspaces can be isolated per data or ML product rather than by department. This approach provides flexibility when cross-functional collaboration is needed or when projects vary in sensitivity and scope.

Typical structure:

Shared dev workspace for experimentation.
Dedicated production workspace for each flagship model or product.
Shared Unity Catalog to ensure data versioning and lineage across products.

4. Hybrid (Hierarchical) Model

A common enterprise compromise integrates the above models:

Development workspaces per project or team, promoting validated assets to a shared business-area workspace for production management.
A single regional Unity Catalog metastore spans all workspaces for centralized lineage and auditing.

This model supports flexibility and team autonomy while retaining centralized security and compliance through Unity Catalog.

Unity Catalog Best Practices for Multi-Workspace CI/CD

From Databricks Unity Catalog governance recommendations :

Use one metastore per region and link all workspaces in that region to it.
Organize catalogs by environment, team, or business unit—catalogs become the main unit of isolation, not separate metastores.
Grant privileges through groups from your identity provider (IdP), not manually per workspace.
Reserve MODIFY access for service principals in production; use CI/CD to promote code and models.
Use managed tables and volumes for most assets, limiting external tables to legacy or cross-platform integrations.
Bind external locations or catalogs to specific workspaces only when isolation requirements demand it.

Practical Guidance for ML-Focused CI/CD

For machine learning environments with CI/CD:

Manage reproducibility and data versioning using Delta Lake + MLflow under Unity Catalog.
Use Databricks Asset Bundles (YAML) to define environment-specific deployment configurations.
Integrate CI/CD pipelines with model registry promotion across Unity Catalog–linked workspaces.
Adopt Terraform or IaC templates to automate workspace, catalog, and cluster policies.

Summary Recommendation

Organization Strategy	Recommended For	Key Unity Catalog Design
Single shared set (Dev/Stg/Prod)	Small to medium organizations	Single metastore, catalogs per business area
LOB-based	Large enterprises with strict isolation	Multiple workspaces per LOB, one metastore
Project/Data Product-based	Cross-functional ML teams	Shared dev/stg workspace, isolated prod per project
Hybrid hierarchical	Enterprises with both governance and flexibility needs	Dev workspaces per project, centralized prod catalog

In 2025, the community and Databricks’ own guidance converge on this principle: use as few workspaces as practical and rely on Unity Catalog for logical data and permission isolation, reserving new workspaces only for strong security, compliance, or regional boundaries.

View solution in original post

mark_ott · ‎10-24-2025

When designing a CI/CD process for Databricks environments — especially for machine learning and data science projects using Unity Catalog — enterprise-scale workspace organization should balance isolation, governance, and collaboration. The recommended practice is to minimize the number of workspaces while using Unity Catalog’s governance features for secure, logical isolation within shared workspaces rather than always relying on full workspace separation.

Recommended Workspace Organization Patterns

1. Single Shared Workspaces with Unity Catalog

For most mid-to-large organizations, one shared set of workspaces (DEV, STG, PROD)—each linked to a single Unity Catalog metastore—offers simplicity and centralized governance.
Unity Catalog allows team- or project-level segregation through catalogs and schemas without creating redundant workspaces.

Typical structure:

Catalog per environment scope or business domain (e.g., sales_dev, sales_prod)
Schema per project or team inside the catalog (e.g., sales.dev_team1, finance.analytics)

This setup takes advantage of hierarchical privilege inheritance, reducing administrative overhead while supporting environment promotion via CI/CD pipelines.

Best suited for:

Organizations standardizing development, staging, and production lifecycles.
Teams needing shared governance and collaboration without strict isolation demands.

2. Line-of-Business (LOB) Isolation

Larger enterprises often align workspace organization with business units (LOBs)—each LOB having its own Dev-Stg-Prod workspaces. All workspaces share a single federated Unity Catalog metastore for consistent policy enforcement and data sharing.

Advantages:

Clear data and user isolation per business area.
Easier cost attribution and admin responsibility separation.
Supports independent CI/CD automation across LOBs.

Drawbacks include greater operational complexity and the need for a centralized Center of Excellence (COE) to standardize pipelines and permissions.

3. Project or Data Product Isolation

For organizations focused on individual ML or analytics products, workspaces can be isolated per data or ML product rather than by department. This approach provides flexibility when cross-functional collaboration is needed or when projects vary in sensitivity and scope.

Typical structure:

Shared dev workspace for experimentation.
Dedicated production workspace for each flagship model or product.
Shared Unity Catalog to ensure data versioning and lineage across products.

4. Hybrid (Hierarchical) Model

A common enterprise compromise integrates the above models:

Development workspaces per project or team, promoting validated assets to a shared business-area workspace for production management.
A single regional Unity Catalog metastore spans all workspaces for centralized lineage and auditing.

This model supports flexibility and team autonomy while retaining centralized security and compliance through Unity Catalog.

Unity Catalog Best Practices for Multi-Workspace CI/CD

From Databricks Unity Catalog governance recommendations :

Use one metastore per region and link all workspaces in that region to it.
Organize catalogs by environment, team, or business unit—catalogs become the main unit of isolation, not separate metastores.
Grant privileges through groups from your identity provider (IdP), not manually per workspace.
Reserve MODIFY access for service principals in production; use CI/CD to promote code and models.
Use managed tables and volumes for most assets, limiting external tables to legacy or cross-platform integrations.
Bind external locations or catalogs to specific workspaces only when isolation requirements demand it.

Practical Guidance for ML-Focused CI/CD

For machine learning environments with CI/CD:

Manage reproducibility and data versioning using Delta Lake + MLflow under Unity Catalog.
Use Databricks Asset Bundles (YAML) to define environment-specific deployment configurations.
Integrate CI/CD pipelines with model registry promotion across Unity Catalog–linked workspaces.
Adopt Terraform or IaC templates to automate workspace, catalog, and cluster policies.

Summary Recommendation

Organization Strategy	Recommended For	Key Unity Catalog Design
Single shared set (Dev/Stg/Prod)	Small to medium organizations	Single metastore, catalogs per business area
LOB-based	Large enterprises with strict isolation	Multiple workspaces per LOB, one metastore
Project/Data Product-based	Cross-functional ML teams	Shared dev/stg workspace, isolated prod per project
Hybrid hierarchical	Enterprises with both governance and flexibility needs	Dev workspaces per project, centralized prod catalog

In 2025, the community and Databricks’ own guidance converge on this principle: use as few workspaces as practical and rely on Unity Catalog for logical data and permission isolation, reserving new workspaces only for strong security, compliance, or regional boundaries.

JoaoPigozzo · ‎10-27-2025

Thank you @mark_ott. I tend to believe that having a single shared set of workspace (Development, Staging, PROD) would be the best choice to balance simplicity and governance aligned with our current complexity level. I also believe that FinOps objectives can be achieved through effective tagging policies.

Databricks Community

Best practices for structuring databricks workspaces for CI/CD and ML workflows

Recommended Workspace Organization Patterns

1. Single Shared Workspaces with Unity Catalog

2. Line-of-Business (LOB) Isolation

3. Project or Data Product Isolation

4. Hybrid (Hierarchical) Model

Unity Catalog Best Practices for Multi-Workspace CI/CD

Practical Guidance for ML-Focused CI/CD

Summary Recommendation

Recommended Workspace Organization Patterns

1. Single Shared Workspaces with Unity Catalog

2. Line-of-Business (LOB) Isolation

3. Project or Data Product Isolation

4. Hybrid (Hierarchical) Model

Unity Catalog Best Practices for Multi-Workspace CI/CD

Practical Guidance for ML-Focused CI/CD

Summary Recommendation

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! December 12 – 21, 2025

PSA: Community Edition retires on January 1, 2026. Move to the Free Edition today to keep your work.

🎤 Call for Presentations: Data + AI Summit 2026 is Open!

Last Chance: Help Shape the 2026 Data + AI Summit | Win a Full Conference Pass

Celebrating Our First Brickster Champion: Louis Frolio