3 weeks ago
Hi everyone,
I’m designing the CI/CD process for our environment environment focused on machine learning and data science projects, and I’d like to understand what the best practices are regarding workspace organization—especially when using Unity Catalog.
I’m currently considering a few possible approaches:
A single workspace for all ML and data projects.
Separate workspace(s) per project.
Separate workspace(s) per business area.
Or a hierarchical model, where each project has its own workspace for development, and once it’s ready, it’s promoted to the workspace of the corresponding business area.
What are the recommended practices or patterns the community has found effective for managing multiple workspace(s) in this type of setup?
Thanks in advance for any insights!
3 weeks ago
When designing a CI/CD process for Databricks environments — especially for machine learning and data science projects using Unity Catalog — enterprise-scale workspace organization should balance isolation, governance, and collaboration. The recommended practice is to minimize the number of workspaces while using Unity Catalog’s governance features for secure, logical isolation within shared workspaces rather than always relying on full workspace separation.
For most mid-to-large organizations, one shared set of workspaces (DEV, STG, PROD)—each linked to a single Unity Catalog metastore—offers simplicity and centralized governance.
Unity Catalog allows team- or project-level segregation through catalogs and schemas without creating redundant workspaces.
Typical structure:
Catalog per environment scope or business domain (e.g., sales_dev, sales_prod)
Schema per project or team inside the catalog (e.g., sales.dev_team1, finance.analytics)
This setup takes advantage of hierarchical privilege inheritance, reducing administrative overhead while supporting environment promotion via CI/CD pipelines.
Best suited for:
Organizations standardizing development, staging, and production lifecycles.
Teams needing shared governance and collaboration without strict isolation demands.
Larger enterprises often align workspace organization with business units (LOBs)—each LOB having its own Dev-Stg-Prod workspaces. All workspaces share a single federated Unity Catalog metastore for consistent policy enforcement and data sharing.
Advantages:
Clear data and user isolation per business area.
Easier cost attribution and admin responsibility separation.
Supports independent CI/CD automation across LOBs.
Drawbacks include greater operational complexity and the need for a centralized Center of Excellence (COE) to standardize pipelines and permissions.
For organizations focused on individual ML or analytics products, workspaces can be isolated per data or ML product rather than by department. This approach provides flexibility when cross-functional collaboration is needed or when projects vary in sensitivity and scope.
Typical structure:
Shared dev workspace for experimentation.
Dedicated production workspace for each flagship model or product.
Shared Unity Catalog to ensure data versioning and lineage across products.
A common enterprise compromise integrates the above models:
Development workspaces per project or team, promoting validated assets to a shared business-area workspace for production management.
A single regional Unity Catalog metastore spans all workspaces for centralized lineage and auditing.
This model supports flexibility and team autonomy while retaining centralized security and compliance through Unity Catalog.
From Databricks Unity Catalog governance recommendations :
Use one metastore per region and link all workspaces in that region to it.
Organize catalogs by environment, team, or business unit—catalogs become the main unit of isolation, not separate metastores.
Grant privileges through groups from your identity provider (IdP), not manually per workspace.
Reserve MODIFY access for service principals in production; use CI/CD to promote code and models.
Use managed tables and volumes for most assets, limiting external tables to legacy or cross-platform integrations.
Bind external locations or catalogs to specific workspaces only when isolation requirements demand it.
For machine learning environments with CI/CD:
Manage reproducibility and data versioning using Delta Lake + MLflow under Unity Catalog.
Use Databricks Asset Bundles (YAML) to define environment-specific deployment configurations.
Integrate CI/CD pipelines with model registry promotion across Unity Catalog–linked workspaces.
Adopt Terraform or IaC templates to automate workspace, catalog, and cluster policies.
| Organization Strategy | Recommended For | Key Unity Catalog Design |
|---|---|---|
| Single shared set (Dev/Stg/Prod) | Small to medium organizations | Single metastore, catalogs per business area |
| LOB-based | Large enterprises with strict isolation | Multiple workspaces per LOB, one metastore |
| Project/Data Product-based | Cross-functional ML teams | Shared dev/stg workspace, isolated prod per project |
| Hybrid hierarchical | Enterprises with both governance and flexibility needs | Dev workspaces per project, centralized prod catalog |
In 2025, the community and Databricks’ own guidance converge on this principle: use as few workspaces as practical and rely on Unity Catalog for logical data and permission isolation, reserving new workspaces only for strong security, compliance, or regional boundaries.
3 weeks ago
When designing a CI/CD process for Databricks environments — especially for machine learning and data science projects using Unity Catalog — enterprise-scale workspace organization should balance isolation, governance, and collaboration. The recommended practice is to minimize the number of workspaces while using Unity Catalog’s governance features for secure, logical isolation within shared workspaces rather than always relying on full workspace separation.
For most mid-to-large organizations, one shared set of workspaces (DEV, STG, PROD)—each linked to a single Unity Catalog metastore—offers simplicity and centralized governance.
Unity Catalog allows team- or project-level segregation through catalogs and schemas without creating redundant workspaces.
Typical structure:
Catalog per environment scope or business domain (e.g., sales_dev, sales_prod)
Schema per project or team inside the catalog (e.g., sales.dev_team1, finance.analytics)
This setup takes advantage of hierarchical privilege inheritance, reducing administrative overhead while supporting environment promotion via CI/CD pipelines.
Best suited for:
Organizations standardizing development, staging, and production lifecycles.
Teams needing shared governance and collaboration without strict isolation demands.
Larger enterprises often align workspace organization with business units (LOBs)—each LOB having its own Dev-Stg-Prod workspaces. All workspaces share a single federated Unity Catalog metastore for consistent policy enforcement and data sharing.
Advantages:
Clear data and user isolation per business area.
Easier cost attribution and admin responsibility separation.
Supports independent CI/CD automation across LOBs.
Drawbacks include greater operational complexity and the need for a centralized Center of Excellence (COE) to standardize pipelines and permissions.
For organizations focused on individual ML or analytics products, workspaces can be isolated per data or ML product rather than by department. This approach provides flexibility when cross-functional collaboration is needed or when projects vary in sensitivity and scope.
Typical structure:
Shared dev workspace for experimentation.
Dedicated production workspace for each flagship model or product.
Shared Unity Catalog to ensure data versioning and lineage across products.
A common enterprise compromise integrates the above models:
Development workspaces per project or team, promoting validated assets to a shared business-area workspace for production management.
A single regional Unity Catalog metastore spans all workspaces for centralized lineage and auditing.
This model supports flexibility and team autonomy while retaining centralized security and compliance through Unity Catalog.
From Databricks Unity Catalog governance recommendations :
Use one metastore per region and link all workspaces in that region to it.
Organize catalogs by environment, team, or business unit—catalogs become the main unit of isolation, not separate metastores.
Grant privileges through groups from your identity provider (IdP), not manually per workspace.
Reserve MODIFY access for service principals in production; use CI/CD to promote code and models.
Use managed tables and volumes for most assets, limiting external tables to legacy or cross-platform integrations.
Bind external locations or catalogs to specific workspaces only when isolation requirements demand it.
For machine learning environments with CI/CD:
Manage reproducibility and data versioning using Delta Lake + MLflow under Unity Catalog.
Use Databricks Asset Bundles (YAML) to define environment-specific deployment configurations.
Integrate CI/CD pipelines with model registry promotion across Unity Catalog–linked workspaces.
Adopt Terraform or IaC templates to automate workspace, catalog, and cluster policies.
| Organization Strategy | Recommended For | Key Unity Catalog Design |
|---|---|---|
| Single shared set (Dev/Stg/Prod) | Small to medium organizations | Single metastore, catalogs per business area |
| LOB-based | Large enterprises with strict isolation | Multiple workspaces per LOB, one metastore |
| Project/Data Product-based | Cross-functional ML teams | Shared dev/stg workspace, isolated prod per project |
| Hybrid hierarchical | Enterprises with both governance and flexibility needs | Dev workspaces per project, centralized prod catalog |
In 2025, the community and Databricks’ own guidance converge on this principle: use as few workspaces as practical and rely on Unity Catalog for logical data and permission isolation, reserving new workspaces only for strong security, compliance, or regional boundaries.
3 weeks ago
Thank you @mark_ott. I tend to believe that having a single shared set of workspace (Development, Staging, PROD) would be the best choice to balance simplicity and governance aligned with our current complexity level. I also believe that FinOps objectives can be achieved through effective tagging policies.
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now