cancel
Showing results for 
Search instead for 
Did you mean: 
Data Governance
Join discussions on data governance practices, compliance, and security within the Databricks Community. Exchange strategies and insights to ensure data integrity and regulatory compliance.
cancel
Showing results for 
Search instead for 
Did you mean: 

Unity Catalog: create the first metastore The great benefit of the Unity catalog is that data is ours and stored in an open format on cloud storage in...

Hubert-Dudek
Esteemed Contributor III

Unity Catalog: create the first metastore

The great benefit of the Unity catalog is that data is ours and stored in an open format on cloud storage in our container. To install the unity catalog, we need to create storage and give databricks access to that storage so metastore can be made through the admin console.

We will use Azure Cloud and Azure Data Lake Storage in that manual.

Storage account

We need to search for “Storage accounts” in the Azure portal.

image.pngIn the storage account, we hit Create button.

On the next page, the most important is to use the region as our databricks region, and on the advanced page, please select it as Data Lake Storage Gen2.

image.pngWe need go to create a storage account, and we need to create a container on which we will store metastore.

image.png 

We need to remember the storage account and container name as we will later use it in metastore settings as <storage_account_name>@<container_name>.dfs.core.windows.netCopy and save so we will use it later.

 

Access Connector for Azure Databricks

Now we need to give databricks access to our storage. So we need to search for “Access Connector for Azure databricks” to achieve that.

image.pngHit create and remember again to use the same region.

image.pngAfter the creation is complete, we must go to the newly created resource. From there, we need to copy the ID of the Access Connector. It is pretty long in format /subscriptions/<YOUR_SUBSCRIPTION_ID>/resourceGroups/<YOUR_RESOURCE_GROUP>/providers/Microsoft.Databricks/accessConnectors/<ACCESS_CONNECTOR_NAME> . Copy and save it so we will use it later.

image.png 

Grant access to the storage account.

Okay, now we need to back to our storage account for the unity catalog. Inside the storage account on the left menu, please click “Access Control (IAM)” and then “+ Add.”

image.pngWe need to select the role “Storage Blob Data Contributor”.

image.pngWe need to select the previously created Access Connector. It is registered as managed identity. We must choose it and hit “Select” and “Review + Assign”.

image.png 

Creating metastore

Now we can go back to Databricks. On the top right corner menu, please select “Manage Account”.

In the left menu, we need to select “Data” and choose “Create metastore”.

Next, we must specify the name and the region we are using. To ADLS Gen 2 path, we need to enter <storage_account_name>@<container_name>.dfs.core.windows.net/, which we created earlier. The forward slash is essential, as is defining the root directory in the container.

Access connector id is the value that we copied earlier in format /subscriptions/<YOUR_SUBSCRIPTION_ID>/resourceGroups/<YOUR_RESOURCE_GROUP>/providers/Microsoft.Databricks/accessConnectors/<ACCESS_CONNECTOR_NAME>

image.pngIn the next step, we need to select our databricks workspace, and that’s all.

Tests

In databricks, we can go to data explorer. There will be displayed information about created metastore. Inside the metastore example, catalog “main” with schema “default” is created. To test metastore, we can create a table using CREATE TABLE main.default.test (ID int);

4 REPLIES 4

Aviral-Bhardwaj
Esteemed Contributor III

informative , thanks for this detailed explanation

AviralBhardwaj

Chaitanya_Raju
Honored Contributor

@Hubert Dudek​  , thanks for the clear explanation, very helpful to the community.

Thanks for reading and like if this is useful and for improvements or feedback please comment.

jose_gonzalez
Databricks Employee
Databricks Employee

Thank you for sharing @Hubert Dudek​ 

RDD1
New Contributor III

This is very good. Can you explain on how it is providing Data governance across the Organization? With this we create a catalog and Organization can manage can create branches under this catalog, but still there should be governance rules need to be enforced on who can access on what at the table level.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group