cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Get total size of data in a catalog and schema in Unity Catalog

apingle
Contributor

For a KPI dashboard, we need to know the exact size of the data in a catalog and also all schemas inside the catalogs. What is the best way to do this?

We tried to iterate over all tables and sum the sizeInBytes using the DESCRIBE DETAIL command for the tables. However, since we have a lot of tables, it takes a really long time.

We also tried looking in the information_schema databases for all the catalogs but couldn't find such information there.

2 REPLIES 2

Anonymous
Not applicable

@Anant Pingle​ : Please try using Databricks' Metadata API. This API provides programmatic access to metadata about Databricks objects such as tables, views, and databases.

from pyspark.sql.functions import sum
 
# Replace "my_catalog" with the name of your catalog
catalog_name = "my_catalog"
 
# Get a list of all tables in the catalog
tables = spark.catalog.listTables(catalog_name)
 
# Compute the size of each table and sum them up
total_size = sum([spark.table(table.database + "." + table.name).count() for table in tables])
 
print(f"The total size of {catalog_name} is {total_size} rows.")

Link to the API documentation: https://docs.databricks.com/dev-tools/api/latest/workspace.html

Anonymous
Not applicable

Hi @Anant Pingle​ 

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!