cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Unity Catalog design in single workspace: dev/prod catalogs and schemas for projects — should we add

JoaoPigozzo
New Contributor III

Hello everyone,

We are currently designing our Unity Catalog structure and would like feedback on whether our approach makes sense and how it could be improved.

Context:

  • We use a single Databricks workspace shared by Data Engineering and Data Science/ML teams.

  • Unity Catalog is used for governance, access control, and lineage.

  • We follow a layered architecture: bronze / silver / gold.

  • Deployments are handled via CI/CD (PR review and automated promotion to production).

  • Ingestion and core pipelines are owned by Data Engineering; Analytics Engineers and Data Scientists build gold tables for specific use cases in dev and promote via PR.

Current model:
We currently separate only by dev and prod catalogs, combined with layers:

  • dev_bronze, dev_silver, dev_gold

  • prod_bronze, prod_silver, prod_gold

Within each catalog, we use schemas to represent domains and projects, for example:

  • Corporate/domain schemas (shared, governed datasets)

  • Project schemas (owned by Analytics Engineers or Data Scientists for specific use cases)

Development and experimentation happen in dev_*.
Only CI/CD pipelines write to prod_* (no direct human writes).

Questions:

  1. In a single-workspace setup, does this dev/prod catalog split with project-level schemas align with Unity Catalog best practices?

  2. What are the main trade-offs of using schemas to isolate projects versus using separate catalogs for projects or teams?

  3. Would you recommend adding a staging environment (e.g., staging_bronze/silver/gold) between dev and prod?

    • When is staging worth the extra operational complexity and storage cost?

  4. Have you seen successful patterns with only dev + prod in regulated or large-scale environments, and what controls compensated for the lack of staging?

We are mainly looking for architectural and operational arguments based on real-world usage of Unity Catalog with mixed DE and DS/ML workloads in the same workspace.

1 REPLY 1

Louis_Frolio
Databricks Employee
Databricks Employee

Hey @JoaoPigozzo  — great question. This one comes up all the time with the customers I train.

I’ve been doing this for quite a while now and have had the chance to see a wide range of implementations and approaches out in the wild. While there’s no single “best” answer — it really depends on the business context and the goals you’re trying to achieve — there are a few best practices that we feel pretty strongly about.

With that framing in mind, here’s how I generally think about it…

Your current architecture is in a really good place. It aligns cleanly with Unity Catalog best practices and provides a strong, scalable foundation for mixed Data Engineering and Data Science workloads. In particular, the environment-based catalog separation (dev vs. prod), combined with domain or project-level schemas, reflects the most commonly recommended production pattern I see in the field today.

Let’s dig in.

Your dev/prod catalog split, paired with medallion layers (dev_bronze, dev_silver, dev_gold, prod_bronze, prod_silver, prod_gold), follows what I’d call the “gold standard” Unity Catalog approach. You’re using catalogs as the primary unit of isolation, which is the single most important architectural principle in Unity Catalog. Organizing domains and projects at the schema level within those catalogs is exactly where that responsibility belongs—it gives you clean logical organization while preserving fine-grained access control through the UC privilege hierarchy.

I also want to call out the governance signal here: CI/CD pipelines being the only writers to prod_* catalogs, with humans developing exclusively in dev_*. That’s a strong, intentional pattern, and it’s one I consistently see in mature platforms.

On the question of schema versus catalog isolation, your current choice is the right default.

Using schemas for project isolation gives you several advantages. It simplifies privilege management by allowing team-level USE SCHEMA grants without multiplying catalogs. It reduces operational overhead and avoids the “catalog explosion” anti-pattern. It preserves lineage visibility and makes cross-project analysis far more natural—especially important when teams share upstream data or need to join across domains.

Separate catalogs per project or team do make sense in specific cases—typically when compliance requires physical storage separation or when workspace-catalog binding must be enforced very strictly. The trade-off is increased operational complexity, harder cross-catalog queries, and fragmented lineage. For most organizations, schemas are the right tool unless compliance explicitly forces a different answer.

In short: use schemas for project isolation unless you have a clear, documented requirement for physical data separation.

On staging environments, the answer is situational rather than prescriptive.

Staging adds the most value in regulated environments with formal approval gates, organizations with strict change-control processes, or teams deploying complex transformations that genuinely benefit from prod-scale integration testing before release. It can also help when CI/CD maturity is still evolving and additional safety nets are required.

That said, many teams operate very successfully with just dev and prod. If you have strong CI/CD, automated testing, disciplined code reviews, and reliable rollback strategies, staging often provides diminishing returns—especially when weighed against the cost of duplicating storage and compute.

If you choose not to add staging, there are solid operational controls that compensate well. These include robust automated testing in dev using production-scale samples, feature flags or blue-green deployment patterns, mandatory senior review for production changes, proactive monitoring at the table level, and strict enforcement of service principals (not users) for all production writes.

For mixed DE and DS workloads, one real-world pattern I see work extremely well is a hybrid schema model.

Bronze and silver schemas are organized by data producer or source system—things like salesforce, stripe, or application event streams—and are owned by Data Engineering. Gold schemas, on the other hand, are organized by business domain or product area, such as finance or marketing, and are typically owned by Analytics Engineers and Data Scientists.

This pattern solves a common pain point: bronze-to-silver-to-gold pipelines don’t always map cleanly to a single schema boundary. Separating producer-oriented upstream layers from consumer-oriented downstream layers preserves clarity, ownership, and lineage. In practice, Analytics and DS teams develop in dev_gold schemas and promote via CI/CD, while Data Engineering maintains control over upstream ingestion and refinement.

A few additional governance refinements to consider as you continue to mature the platform:

Transfer ownership of production catalogs and schemas to groups, not individual users. Use service principals for all CI/CD writes to production. Grant BROWSE broadly to support discoverability, while keeping USE CATALOG and USE SCHEMA tightly scoped. And if you need full physical separation between environments, managed storage locations at the catalog level are a clean way to achieve it.

Net-net: your architecture is well aligned with industry patterns and Unity Catalog design principles. The main opportunity for refinement—if it’s not already in place—is adopting the producer versus product hybrid schema model to better support mixed DE, analytics, and ML workloads at scale.

Cheers, Lou.

 

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now