cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Unity Catalog + Medallion Architecture

shane_t
New Contributor

I am looking for a reference architecture or an example on how to organize unity catalog while adhering to the medallion architecture.

What are some common naming conventions and methods?

How to you isolate environments (dev/prod)?

I was thinking of something like this:

- One "catalog" per environment.

- One "schema" per  set of tables with gold/silver/bronze versions

Example:

lakehouse_dev (catalog)

- bronze_widgets (schema)

  -- table1_bronze, table2_bronze, etc

- silver_widgets (schema)

  -- table1_silver, table2_silver, etc

- gold_widgets (schema)

 -- table1_gold, table2_gold, etc

lakehouse_prod (catalog)

- bronze_widgets (schema)

  -- table1_bronze, table2_bronze, etc

- silver_widgets (schema)

  -- table1_silver, table2_silver, etc

- gold_widgets (schema)

 -- table1_gold, table2_gold, etc

I'd love get get some other peoples thoughts on how they have implemented this in their organization.  Thanks!

3 REPLIES 3

Kaniz
Community Manager
Community Manager

Hi @shane_t, Your approach to organizing the Unity Catalog adheres to the Medallion Architecture and is a common practice. 

 

Medallion Architecture1234:

  • It’s a data design pattern used to logically organize data in a lakehouse.
  • The goal is to incrementally and progressively improve the structure and quality of data as it flows through each layer of the architecture (from Bronze ⇒ Silver ⇒ Gold layer tables).

Unity Catalog Organization5678:

  • Unity Catalog provides a common namespace that allows you to govern and audit your data in one place.
  • The hierarchy of primary data objects flows from metastore to table or volume.
  • You reference all data in Unity Catalog using a three-level namespace: catalog.schema.asset, where asset can be a table, view, or volume.

Naming Conventions91011:

  • Project-based catalogs: Define a catalog for each project. This works well when there is little overlap in source data.
  • Shared Zone Catalogs: Create one catalog named bronze and create schemas underneath that to refer to the source system. This approach could even be extended to a common silver layer.
  • Role-Based Catalogs: Create multiple catalogs based on the consumer’s role based on the role the consumer plays within your organization.
  • Environment-Based Catalogs: Catalogs are environment-specific (dev / test / prod) and layer-specific (bronze / silver / gold).

Isolating Environments12135:

  • Unity Catalog offers new isolation mechanisms within the namespace that organizations have traditionally addressed using multiple Hive metastores.
  • These isolation mechanisms enable groups to operate independently with minimal or no interaction and also allow them to achieve isolation in other scenarios, such as production vs development environments.
  • Isolation standards might vary for your organization, but typically they include the following expectations: Users can only gain access to data based on specified access rules. Data can be managed only by designated people or teams. Data is physically separated in storage. Data can be accessed only in designated environments.

Implementing Unity Catalog in an Organization1451516:

  • Confirm that your workspace is enabled for Unity Catalog.
  • Add users and assign the workspace admin role.
  • Create clusters or SQL warehouses that users can use to run queries and create objects.
  • Grant privileges to users.
  • Create new catalogs and schemas.

 

I hope this information helps! Let me know if you have any other questions.

JN_Bristol
New Contributor II

Advancing Analytics have a good (although contrarian) take on the medallion architecture: Behind the Hype - The Medallion Architecture Doesn't Work - YouTube

I found that vid very helpful.

Kaniz
Community Manager
Community Manager

Hey there! Thanks a bunch for being part of our awesome community! 🎉 

We love having you around and appreciate all your questions. Take a moment to check out the responses – you'll find some great info. Your input is valuable, so pick the best solution for you. And remember, if you ever need more help , we're here for you! 

Keep being awesome! 😊🚀

 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.