cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Unity Catalog: Metastore 3 level Hierarchy

VasuKumarT
New Contributor

I have data files categorized by application and region. Want to know the best way to load them into the Bronze and Silver layers while maintaining proper segregation.

For example, in our landing zone, we have a structure of raw files to be loaded using Autoloader with files from multiple applications and regions as in below. With a three-level naming convention in Unity catalog, I am concerned about difficulty in tracing table to file mapping in the future. Can you suggest the best possible architecture for this and explain its merits and internal organization? Thank you for your time and assistance.

Eg : Below is landing zone structure of raw files to be loaded using Autoloader.

        App1 : Region1 : File1

         App1 : Region1 : File2

        App2 : Region1 : File1

        App2 : Region1 : File2

There are multiple applications with around 11 regions and approx 100 files daily to be loaded

 

1 REPLY 1

Shazaamzaa
New Contributor III

If I understand it correctly, you have source files partitioned by application and region in cloud storage that you want to load and would like some suggestions on the Unity Catalog structure. It will definitely depend on how you want the data to be consumed and security/access control requirements. I've added some assumptions to get you started.

1. Is there a need to view the data across regions or across apps? If unsure, I would use the smallest multiple of the group for catalog level separation. For example, I would create a catalog per application and schema per region and tables under them, assuming each region has more than one table i.e. `app_1.region_1.table_1`

2. Does access need to be controlled per region or app? - If you follow above recommendation, then you can set access control at the catalog level for each app scope or schema level for each region.

You will need to consider your specific usage requirements to help determine the catalog structure. Hope this helps.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group