cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Get size of metastore specifically

ac0
New Contributor III

Currently my Databricks Metastore is in the the same location as the data for my production catalog. We are moving the data to a separate storage account. In advance of this, I'm curious if there is a way to determine the size of the metastore itself; essentially I want to find out what size the Azure Storage Account hosting the metastore will be once all the data is moved out. Is there a way I can find this somewhere? Or does anyone have an estimate as a percentage of total date, etc?

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @ac0,  Let’s explore how you can determine the size of your Databricks Metastore and estimate the storage requirements for the Azure Storage Account hosting the metastore.

  1. Metastore Size:

    • The metastore in Unity Catalog is the top-level container for data. It registers metadata about securable objects (such as tables, volumes, external locations, and shares) and the permissions that govern access to them.
    • Each metastore exposes a three-level namespace (catalog.schema.table) by which data can be organized.
    • You must have one metastore for each region in which your organization operates.
    • To work with Unity Catalog, users must be on a workspace that is attached to a metastore in their region.
    • To create a metastore, follow these steps:
      1. Optionally create a storage location for metastore-level managed storage. This storage account will contain Unity Catalog managed tables and volumes. It should be an Azure Data Lake Storage Gen2 account in the same region as your Azure Databricks workspaces.
      2. Create an Azure managed identity that gives access to the storage location.
      3. In Azure Databricks, create the metastore, attaching the storage location, and assign workspaces to the metastore.
    • Note: You can also create a metastore using the Databricks Terraform provider1.
  2. Estimating Storage Account Size:

    • While there isn’t a direct way to determine the exact size of the metastore itself, you can estimate the storage requirements based on the following factors:
      • The number of Tables and Volumes: The more tables and volumes registered in the metastore, the larger the storage requirements.
      • Metadata Overhead: Unity Catalog stores metadata about securable objects, permissions, and namespaces. This overhead contributes to the storage size.
      • Data Volume: If your tables and volumes contain large amounts of data, it will impact the storage account size.
      • Managed Storage: Consider whether you need metastore-level storage for managed tables and volumes. If so, this will add to the storage requirements.
    • Unfortunately, there isn’t a fixed percentage of total data that directly corresponds to the metastore size. It varies based on your specific use case and data organization.
  3. Best Practices:

Remember that the metastore primarily contains metadata, so its size won’t be as significant as the actual data stored in your tables and volumes. For precise estimates, monitor the storage usage as you gradually move data out of the metastore and observe the impact on the storage account size. 📊🔍

 
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.