cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Governance
Join discussions on data governance practices, compliance, and security within the Databricks Community. Exchange strategies and insights to ensure data integrity and regulatory compliance.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Unity Catalog: create the first metastoreย The great benefit of the Unity catalog is that data is ours and stored in an open format on cloud storage in...

Hubert-Dudek
Esteemed Contributor III

Unity Catalog: create the first metastore

The great benefit of the Unity catalog is that data is ours and stored in an open format on cloud storage in our container. To install the unity catalog, we need to create storage and give databricks access to that storage so metastore can be made through the admin console.

We will use Azure Cloud and Azure Data Lake Storage in that manual.

Storage account

We need to search for โ€œStorage accountsโ€ in the Azure portal.

image.pngIn the storage account, we hit Create button.

On the next page, the most important is to use the region as our databricks region, and on the advanced page, please select it as Data Lake Storage Gen2.

image.pngWe need go to create a storage account, and we need to create a container on which we will store metastore.

image.png 

We need to remember the storage account and container name as we will later use it in metastore settings as <storage_account_name>@<container_name>.dfs.core.windows.netCopy and save so we will use it later.

 

Access Connector for Azure Databricks

Now we need to give databricks access to our storage. So we need to search for โ€œAccess Connector for Azure databricksโ€ to achieve that.

image.pngHit create and remember again to use the same region.

image.pngAfter the creation is complete, we must go to the newly created resource. From there, we need to copy the ID of the Access Connector. It is pretty long in format /subscriptions/<YOUR_SUBSCRIPTION_ID>/resourceGroups/<YOUR_RESOURCE_GROUP>/providers/Microsoft.Databricks/accessConnectors/<ACCESS_CONNECTOR_NAME> . Copy and save it so we will use it later.

image.png 

Grant access to the storage account.

Okay, now we need to back to our storage account for the unity catalog. Inside the storage account on the left menu, please click โ€œAccess Control (IAM)โ€ and then โ€œ+ Add.โ€

image.pngWe need to select the role โ€œStorage Blob Data Contributorโ€.

image.pngWe need to select the previously created Access Connector. It is registered as managed identity. We must choose it and hit โ€œSelectโ€ and โ€œReview + Assignโ€.

image.png 

Creating metastore

Now we can go back to Databricks. On the top right corner menu, please select โ€œManage Accountโ€.

In the left menu, we need to select โ€œDataโ€ and choose โ€œCreate metastoreโ€.

Next, we must specify the name and the region we are using. To ADLS Gen 2 path, we need to enter <storage_account_name>@<container_name>.dfs.core.windows.net/, which we created earlier. The forward slash is essential, as is defining the root directory in the container.

Access connector id is the value that we copied earlier in format /subscriptions/<YOUR_SUBSCRIPTION_ID>/resourceGroups/<YOUR_RESOURCE_GROUP>/providers/Microsoft.Databricks/accessConnectors/<ACCESS_CONNECTOR_NAME>

image.pngIn the next step, we need to select our databricks workspace, and thatโ€™s all.

Tests

In databricks, we can go to data explorer. There will be displayed information about created metastore. Inside the metastore example, catalog โ€œmainโ€ with schema โ€œdefaultโ€ is created. To test metastore, we can create a table using CREATE TABLE main.default.test (ID int);

4 REPLIES 4

Aviral-Bhardwaj
Esteemed Contributor III

informative , thanks for this detailed explanation

AviralBhardwaj

Chaitanya_Raju
Honored Contributor

@Hubert Dudekโ€‹  , thanks for the clear explanation, very helpful to the community.

Thanks for reading and like if this is useful and for improvements or feedback please comment.

jose_gonzalez
Moderator
Moderator

Thank you for sharing @Hubert Dudekโ€‹ 

RDD1
New Contributor III

This is very good. Can you explain on how it is providing Data governance across the Organization? With this we create a catalog and Organization can manage can create branches under this catalog, but still there should be governance rules need to be enforced on who can access on what at the table level.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group