cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Get size of metastore specifically

ac0
New Contributor III

Currently my Databricks Metastore is in the the same location as the data for my production catalog. We are moving the data to a separate storage account. In advance of this, I'm curious if there is a way to determine the size of the metastore itself; essentially I want to find out what size the Azure Storage Account hosting the metastore will be once all the data is moved out. Is there a way I can find this somewhere? Or does anyone have an estimate as a percentage of total date, etc?

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @ac0,  Let’s explore how you can determine the size of your Databricks Metastore and estimate the storage requirements for the Azure Storage Account hosting the metastore.

  1. Metastore Size:

    • The metastore in Unity Catalog is the top-level container for data. It registers metadata about securable objects (such as tables, volumes, external locations, and shares) and the permissions that govern access to them.
    • Each metastore exposes a three-level namespace (catalog.schema.table) by which data can be organized.
    • You must have one metastore for each region in which your organization operates.
    • To work with Unity Catalog, users must be on a workspace that is attached to a metastore in their region.
    • To create a metastore, follow these steps:
      1. Optionally create a storage location for metastore-level managed storage. This storage account will contain Unity Catalog managed tables and volumes. It should be an Azure Data Lake Storage Gen2 account in the same region as your Azure Databricks workspaces.
      2. Create an Azure managed identity that gives access to the storage location.
      3. In Azure Databricks, create the metastore, attaching the storage location, and assign workspaces to the metastore.
    • Note: You can also create a metastore using the Databricks Terraform provider1.
  2. Estimating Storage Account Size:

    • While there isn’t a direct way to determine the exact size of the metastore itself, you can estimate the storage requirements based on the following factors:
      • The number of Tables and Volumes: The more tables and volumes registered in the metastore, the larger the storage requirements.
      • Metadata Overhead: Unity Catalog stores metadata about securable objects, permissions, and namespaces. This overhead contributes to the storage size.
      • Data Volume: If your tables and volumes contain large amounts of data, it will impact the storage account size.
      • Managed Storage: Consider whether you need metastore-level storage for managed tables and volumes. If so, this will add to the storage requirements.
    • Unfortunately, there isn’t a fixed percentage of total data that directly corresponds to the metastore size. It varies based on your specific use case and data organization.
  3. Best Practices:

Remember that the metastore primarily contains metadata, so its size won’t be as significant as the actual data stored in your tables and volumes. For precise estimates, monitor the storage usage as you gradually move data out of the metastore and observe the impact on the storage account size. 📊🔍

 
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!