โ01-06-2023 08:13 AM
Unity Catalog: create the first metastore
The great benefit of the Unity catalog is that data is ours and stored in an open format on cloud storage in our container. To install the unity catalog, we need to create storage and give databricks access to that storage so metastore can be made through the admin console.
We will use Azure Cloud and Azure Data Lake Storage in that manual.
Storage account
We need to search for โStorage accountsโ in the Azure portal.
In the storage account, we hit Create button.
On the next page, the most important is to use the region as our databricks region, and on the advanced page, please select it as Data Lake Storage Gen2.
We need go to create a storage account, and we need to create a container on which we will store metastore.
We need to remember the storage account and container name as we will later use it in metastore settings as <storage_account_name>@<container_name>.dfs.core.windows.net/ Copy and save so we will use it later.
Access Connector for Azure Databricks
Now we need to give databricks access to our storage. So we need to search for โAccess Connector for Azure databricksโ to achieve that.
Hit create and remember again to use the same region.
After the creation is complete, we must go to the newly created resource. From there, we need to copy the ID of the Access Connector. It is pretty long in format /subscriptions/<YOUR_SUBSCRIPTION_ID>/resourceGroups/<YOUR_RESOURCE_GROUP>/providers/Microsoft.Databricks/accessConnectors/<ACCESS_CONNECTOR_NAME> . Copy and save it so we will use it later.
Grant access to the storage account.
Okay, now we need to back to our storage account for the unity catalog. Inside the storage account on the left menu, please click โAccess Control (IAM)โ and then โ+ Add.โ
We need to select the role โStorage Blob Data Contributorโ.
We need to select the previously created Access Connector. It is registered as managed identity. We must choose it and hit โSelectโ and โReview + Assignโ.
Creating metastore
Now we can go back to Databricks. On the top right corner menu, please select โManage Accountโ.
In the left menu, we need to select โDataโ and choose โCreate metastoreโ.
Next, we must specify the name and the region we are using. To ADLS Gen 2 path, we need to enter <storage_account_name>@<container_name>.dfs.core.windows.net/, which we created earlier. The forward slash is essential, as is defining the root directory in the container.
Access connector id is the value that we copied earlier in format /subscriptions/<YOUR_SUBSCRIPTION_ID>/resourceGroups/<YOUR_RESOURCE_GROUP>/providers/Microsoft.Databricks/accessConnectors/<ACCESS_CONNECTOR_NAME>
In the next step, we need to select our databricks workspace, and thatโs all.
Tests
In databricks, we can go to data explorer. There will be displayed information about created metastore. Inside the metastore example, catalog โmainโ with schema โdefaultโ is created. To test metastore, we can create a table using CREATE TABLE main.default.test (ID int);
โ01-07-2023 08:08 AM
informative , thanks for this detailed explanation
โ01-07-2023 09:34 AM
@Hubert Dudekโ , thanks for the clear explanation, very helpful to the community.
โ04-06-2023 11:43 AM
Thank you for sharing @Hubert Dudekโ
โ05-08-2023 11:34 AM
This is very good. Can you explain on how it is providing Data governance across the Organization? With this we create a catalog and Organization can manage can create branches under this catalog, but still there should be governance rules need to be enforced on who can access on what at the table level.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group