cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Get total size of data in a catalog and schema in Unity Catalog

apingle
Contributor

For a KPI dashboard, we need to know the exact size of the data in a catalog and also all schemas inside the catalogs. What is the best way to do this?

We tried to iterate over all tables and sum the sizeInBytes using the DESCRIBE DETAIL command for the tables. However, since we have a lot of tables, it takes a really long time.

We also tried looking in the information_schema databases for all the catalogs but couldn't find such information there.

2 REPLIES 2

Anonymous
Not applicable

@Anant Pingleโ€‹ : Please try using Databricks' Metadata API. This API provides programmatic access to metadata about Databricks objects such as tables, views, and databases.

from pyspark.sql.functions import sum
 
# Replace "my_catalog" with the name of your catalog
catalog_name = "my_catalog"
 
# Get a list of all tables in the catalog
tables = spark.catalog.listTables(catalog_name)
 
# Compute the size of each table and sum them up
total_size = sum([spark.table(table.database + "." + table.name).count() for table in tables])
 
print(f"The total size of {catalog_name} is {total_size} rows.")

Link to the API documentation: https://docs.databricks.com/dev-tools/api/latest/workspace.html

Anonymous
Not applicable

Hi @Anant Pingleโ€‹ 

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group