cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Governance
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Unity Catalog: create the first metastoreย The great benefit of the Unity catalog is that data is ours and stored in an open format on cloud storage in...

Hubert-Dudek
Esteemed Contributor III

Unity Catalog: create the first metastore

The great benefit of the Unity catalog is that data is ours and stored in an open format on cloud storage in our container. To install the unity catalog, we need to create storage and give databricks access to that storage so metastore can be made through the admin console.

We will use Azure Cloud and Azure Data Lake Storage in that manual.

Storage account

We need to search for โ€œStorage accountsโ€ in the Azure portal.

image.pngIn the storage account, we hit Create button.

On the next page, the most important is to use the region as our databricks region, and on the advanced page, please select it as Data Lake Storage Gen2.

image.pngWe need go to create a storage account, and we need to create a container on which we will store metastore.

image.png 

We need to remember the storage account and container name as we will later use it in metastore settings as <storage_account_name>@<container_name>.dfs.core.windows.netCopy and save so we will use it later.

 

Access Connector for Azure Databricks

Now we need to give databricks access to our storage. So we need to search for โ€œAccess Connector for Azure databricksโ€ to achieve that.

image.pngHit create and remember again to use the same region.

image.pngAfter the creation is complete, we must go to the newly created resource. From there, we need to copy the ID of the Access Connector. It is pretty long in format /subscriptions/<YOUR_SUBSCRIPTION_ID>/resourceGroups/<YOUR_RESOURCE_GROUP>/providers/Microsoft.Databricks/accessConnectors/<ACCESS_CONNECTOR_NAME> . Copy and save it so we will use it later.

image.png 

Grant access to the storage account.

Okay, now we need to back to our storage account for the unity catalog. Inside the storage account on the left menu, please click โ€œAccess Control (IAM)โ€ and then โ€œ+ Add.โ€

image.pngWe need to select the role โ€œStorage Blob Data Contributorโ€.

image.pngWe need to select the previously created Access Connector. It is registered as managed identity. We must choose it and hit โ€œSelectโ€ and โ€œReview + Assignโ€.

image.png 

Creating metastore

Now we can go back to Databricks. On the top right corner menu, please select โ€œManage Accountโ€.

In the left menu, we need to select โ€œDataโ€ and choose โ€œCreate metastoreโ€.

Next, we must specify the name and the region we are using. To ADLS Gen 2 path, we need to enter <storage_account_name>@<container_name>.dfs.core.windows.net/, which we created earlier. The forward slash is essential, as is defining the root directory in the container.

Access connector id is the value that we copied earlier in format /subscriptions/<YOUR_SUBSCRIPTION_ID>/resourceGroups/<YOUR_RESOURCE_GROUP>/providers/Microsoft.Databricks/accessConnectors/<ACCESS_CONNECTOR_NAME>

image.pngIn the next step, we need to select our databricks workspace, and thatโ€™s all.

Tests

In databricks, we can go to data explorer. There will be displayed information about created metastore. Inside the metastore example, catalog โ€œmainโ€ with schema โ€œdefaultโ€ is created. To test metastore, we can create a table using CREATE TABLE main.default.test (ID int);

4 REPLIES 4

Aviral-Bhardwaj
Esteemed Contributor III

informative , thanks for this detailed explanation

Chaitanya_Raju
Honored Contributor

@Hubert Dudekโ€‹  , thanks for the clear explanation, very helpful to the community.

jose_gonzalez
Moderator
Moderator

Thank you for sharing @Hubert Dudekโ€‹ 

RDD1
New Contributor III

This is very good. Can you explain on how it is providing Data governance across the Organization? With this we create a catalog and Organization can manage can create branches under this catalog, but still there should be governance rules need to be enforced on who can access on what at the table level.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.