cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

how to check table size by partition?

tototox
New Contributor III

I want to check the size of the delta table by partition.

As you can see, only the size of the table can be checked, but not by partition.

4 REPLIES 4

Anonymous
Not applicable

@jin parkโ€‹ :

You can use the Databricks Delta Lake SHOW TABLE EXTENDED command to get the size of each partition of the table. Here's an example:

%sql
SHOW TABLE EXTENDED LIKE '<table_name>' 
PARTITION (<partition_column> = '<partition_value>') 
SELECT sizeInBytes

Replace <table_name> with the name of your Delta table, <partition_column> with the name of your partition column, and <partition_value> with the specific partition value you want to check the size for. If you want to check the size for all partitions, omit the PARTITION clause.

You can also use the DESCRIBE DETAIL command to get similar information:

%sql
DESCRIBE DETAIL <table_name>

This will show you detailed information about the table, including the size of each partition.

tototox
New Contributor III

There is no 'sizeInbytes' item.

Anonymous
Not applicable

@jin parkโ€‹ : Please try this

DESCRIBE DETAIL your_table_name PARTITION (partition_column = 'partition_value')

Replace 'your_table_name' with the actual name of your table and specify the appropriate partition_column and partition_value you want to check.

Carsten_Herbe
New Contributor II
The previous two answers did not work for me (DBX 15.4).
I found a hacky way using the delta log: f
ind latest (group of) checkpoint (parquet) file(s) in delta log and use it as source prefix `000000000000xxxxxxx.checkpoint`:
SELECT
partition_column_1,
partition_column_2,
round(sum(size/1000/1000/1000),2) AS size_gb,
count(*) AS num_files,
round(min(size/1000/1000),2) AS min_file_size_mb,
round(max(size/1000/1000),2) AS max_file_size_mb
FROM (
SELECT
add.partitionValues.partition_column_1,
add.partitionValues.partition_column_2,
add.size AS size
FROM PARQUET.`s3://my-bucket/my_table/_delta_log/0000000000000xxxxxxx.checkpoint.*`
)
WHERE 1=1
AND partition_column_1 IS NOT NULL
GROUP BY GROUPING SETS((), (partition_column_1, partition_column_2))
ORDER BY size_gb DESC

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group