how to check table size by partition?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-11-2023 07:08 AM
I want to check the size of the delta table by partition.
As you can see, only the size of the table can be checked, but not by partition.
- Labels:
-
Table
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-13-2023 08:57 AM
@jin park :
You can use the Databricks Delta Lake SHOW TABLE EXTENDED command to get the size of each partition of the table. Here's an example:
%sql
SHOW TABLE EXTENDED LIKE '<table_name>'
PARTITION (<partition_column> = '<partition_value>')
SELECT sizeInBytesReplace <table_name> with the name of your Delta table, <partition_column> with the name of your partition column, and <partition_value> with the specific partition value you want to check the size for. If you want to check the size for all partitions, omit the PARTITION clause.
You can also use the DESCRIBE DETAIL command to get similar information:
%sql
DESCRIBE DETAIL <table_name>This will show you detailed information about the table, including the size of each partition.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-14-2023 12:57 AM
There is no 'sizeInbytes' item.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-09-2023 12:23 AM
@jin park : Please try this
DESCRIBE DETAIL your_table_name PARTITION (partition_column = 'partition_value')Replace 'your_table_name' with the actual name of your table and specify the appropriate partition_column and partition_value you want to check.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-17-2024 04:16 AM
I found a hacky way using the delta log: find latest (group of) checkpoint (parquet) file(s) in delta log and use it as source prefix `000000000000xxxxxxx.checkpoint`:
SELECT
partition_column_1,
partition_column_2,
round(sum(size/1000/1000/1000),2) AS size_gb,
count(*) AS num_files,
round(min(size/1000/1000),2) AS min_file_size_mb,
round(max(size/1000/1000),2) AS max_file_size_mb
FROM (
SELECT
add.partitionValues.partition_column_1,
add.partitionValues.partition_column_2,
add.size AS size
FROM PARQUET.`s3://my-bucket/my_table/_delta_log/0000000000000xxxxxxx.checkpoint.*`
)
WHERE 1=1
AND partition_column_1 IS NOT NULL
GROUP BY GROUPING SETS((), (partition_column_1, partition_column_2))
ORDER BY size_gb DESC