cancel
Showing results for 
Search instead for 
Did you mean: 
Data Governance
cancel
Showing results for 
Search instead for 
Did you mean: 

I am new to Data bricks. Setting up Data bricks Unity Catalog, in terms of best practice i have few questions.

Avvar2022
New Contributor III

  1. Is it best practice to separate unity catalog meta store ADLS Gen2 separate from ADLS Gen 2 to store data ?
  2. Since per region only one meta store can be created, will there be a separate meta store for PROD, and NON-PROD(QA and DEV)? If yes they need to be separate region.
  3. My understanding is one meta store can be configured to one ADLS Gen2 > 1 container. this means all environments using this 1 container ? in multi subscription environment where should storage account should be hosted.

Thank you in advance for your help!

1 ACCEPTED SOLUTION

Accepted Solutions

Anonymous
Not applicable

@Ashok Zubrewar​  Please find the answers inline

  • Is it best practice to separate unity catalog meta store ADLS Gen2 separate from ADLS Gen 2 to store data ?
    • Though it's not mandatory, but it's better to separate UC ADLS path from other data to avoid management overhead.
  • Since per region only one meta store can be created, will there be a separate meta store for PROD, and NON-PROD(QA and DEV)? If yes they need to be separate region.
    • There is no need to create separate metastore for each environment but isolate the environment by having a managed location for each catalog. What I mean is, create DEV catalog with managed location (separate container) and create catalogs for other environments with different container as the managed location. Thus data of each environment is isolated. Please refer to below doc on the same.

https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/create-metastore

  • My understanding is one meta store can be configured to one ADLS Gen2 > 1 container. this means all environments using this 1 container ? in multi subscription environment where should storage account should be hosted.
    • Previous comment partially answers this question. You can isolate the environments by container level or different ADLS storage account itself. Ensure that you have set up storage credentials and external location for those path before creating catalogs with managed location.

View solution in original post

4 REPLIES 4

Anonymous
Not applicable

@Ashok Zubrewar​  Please find the answers inline

  • Is it best practice to separate unity catalog meta store ADLS Gen2 separate from ADLS Gen 2 to store data ?
    • Though it's not mandatory, but it's better to separate UC ADLS path from other data to avoid management overhead.
  • Since per region only one meta store can be created, will there be a separate meta store for PROD, and NON-PROD(QA and DEV)? If yes they need to be separate region.
    • There is no need to create separate metastore for each environment but isolate the environment by having a managed location for each catalog. What I mean is, create DEV catalog with managed location (separate container) and create catalogs for other environments with different container as the managed location. Thus data of each environment is isolated. Please refer to below doc on the same.

https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/create-metastore

  • My understanding is one meta store can be configured to one ADLS Gen2 > 1 container. this means all environments using this 1 container ? in multi subscription environment where should storage account should be hosted.
    • Previous comment partially answers this question. You can isolate the environments by container level or different ADLS storage account itself. Ensure that you have set up storage credentials and external location for those path before creating catalogs with managed location.

Avvar2022
New Contributor III

Thank you so much for response. Based on your answers, i have also set up a POC metastore and able to understand catalog separation.

1. I am clear

2. i am clear

3. I am still not clear, even though it is not mandatory to have 2 separate ADLS Gen2 storage accounts but let's assume in our case we have made a decision to have 2 separate ADLS Gen2.

One - UC Catalog

Second - To store data (Catalog/schema/tables)

In multi subscription environment where UC Path (ADLS Gen2Storage account) should be hosted ?in Prod subscription or Non-Prod subscription? or does it matter ? as long as all strict access control in place can we host in either subscription ? since we are starting from scratch would like to get some feedback on best practice.

Anonymous
Not applicable

Hi @Ashok Zubrewar​ 

Your input matters! Help our community thrive by coming back and marking the most helpful and accurate answers. Together, we can make a difference!

Thanks and Regards

karthik_p
Esteemed Contributor

@Ashok Zubrewar​  coming to your 3 rd question, if you are using any external tables then non uc ADLS GEN 2 is mandatory, you can not use UC ADLS GEN2. as it hosts metadata and managed table data. there is no restriction in terms of your external buckets (ADLS GEN2 Storage regions should be on same region as UC ADLS GEN2 , but to avoid performance Issues best way is to have in same region). once you configure your non UC ADLS GEN2 and add storage credential and storage location, you should be good to access your ADLS GEN2 in UC, but currently we need to remember we have limitations in UC for external tables ( won't support OPTIMIZATION), databricks recommends to use managed tables . but based on use case we need, as mostly for analytics purpose we will be using external tables, we may not avoid that

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.