cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Unity Catalog for medallion architecture

mbravonxp
New Contributor II

Hello community.

 

I need help to define the most suitable approach for Unity Catalog. I have the following storage architecture in Azure Data Lake Storage.

 

  • I have data from different clients
  • I work with 3 different environments for each client: dev, pre, pro
  • I need to implement a medallion architecture, bronze, silver and gold for each environment

I need to read, write and work with the Data from Databricks. The Azure data storage gets updated with new data on a daily basis.

 

What would be the best approach considering catalogs, schemas, external volumes, tables and so on?

 

Thanks in advance.

2 ACCEPTED SOLUTIONS

Accepted Solutions

Alberto_Umana
Databricks Employee
Databricks Employee

Hello @mbravonxp,

Create a separate catalog for each client to logically isolate their data. This helps in managing permissions and organizing data efficiently.

Within each catalog, create schemas for each environment (dev, pre, pro). This will help in managing the data lifecycle and access control for different stages of development

  • Implement the medallion architecture by creating separate schemas or tables within each environment for the bronze, silver, and gold layers. This will help in organizing the data processing pipeline and maintaining data quality.
    • Bronze: Raw data ingestion.
    • Silver: Cleaned and enriched data.
    • Gold: Aggregated and business-level data.

 

Use Databricks jobs or workflows to automate the data processing and updating of tables in the bronze, silver, and gold layers

 

Best Practices:

  • Avoid giving direct storage-level access to users for Unity Catalog managed tables or volumes to maintain data security and governance.
  • Co-locate your Databricks workspace, metastore, and storage in the same Azure region for optimal performance

View solution in original post

filipniziol
Esteemed Contributor

Hi @mbravonxp ,

in this case the best approach is to have a single catalog per client per environment (so 9 catalogs in total).
In every catalog you will create bronze, silver and gold schema.
Additionally every catalog will have a separate storage and also, you may consider to have a separate workspace for each client for each environment.

Check my answer to the similar topic on the forum:
https://community.databricks.com/t5/community-platform-discussions/unity-catalog-implementation/td-p...

 

 

 

View solution in original post

3 REPLIES 3

Alberto_Umana
Databricks Employee
Databricks Employee

Hello @mbravonxp,

Create a separate catalog for each client to logically isolate their data. This helps in managing permissions and organizing data efficiently.

Within each catalog, create schemas for each environment (dev, pre, pro). This will help in managing the data lifecycle and access control for different stages of development

  • Implement the medallion architecture by creating separate schemas or tables within each environment for the bronze, silver, and gold layers. This will help in organizing the data processing pipeline and maintaining data quality.
    • Bronze: Raw data ingestion.
    • Silver: Cleaned and enriched data.
    • Gold: Aggregated and business-level data.

 

Use Databricks jobs or workflows to automate the data processing and updating of tables in the bronze, silver, and gold layers

 

Best Practices:

  • Avoid giving direct storage-level access to users for Unity Catalog managed tables or volumes to maintain data security and governance.
  • Co-locate your Databricks workspace, metastore, and storage in the same Azure region for optimal performance

filipniziol
Esteemed Contributor

Hi @mbravonxp ,

in this case the best approach is to have a single catalog per client per environment (so 9 catalogs in total).
In every catalog you will create bronze, silver and gold schema.
Additionally every catalog will have a separate storage and also, you may consider to have a separate workspace for each client for each environment.

Check my answer to the similar topic on the forum:
https://community.databricks.com/t5/community-platform-discussions/unity-catalog-implementation/td-p...

 

 

 

mbravonxp
New Contributor II

Hi both,

Thanks very much for the useful replies. Definitely I will go for your suggestions.

Best.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now