Carsten_Herbe
New Contributor II
The previous two answers did not work for me (DBX 15.4).
I found a hacky way using the delta log: f
ind latest (group of) checkpoint (parquet) file(s) in delta log and use it as source prefix `000000000000xxxxxxx.checkpoint`:
SELECT
partition_column_1,
partition_column_2,
round(sum(size/1000/1000/1000),2) AS size_gb,
count(*) AS num_files,
round(min(size/1000/1000),2) AS min_file_size_mb,
round(max(size/1000/1000),2) AS max_file_size_mb
FROM (
SELECT
add.partitionValues.partition_column_1,
add.partitionValues.partition_column_2,
add.size AS size
FROM PARQUET.`s3://my-bucket/my_table/_delta_log/0000000000000xxxxxxx.checkpoint.*`
)
WHERE 1=1
AND partition_column_1 IS NOT NULL
GROUP BY GROUPING SETS((), (partition_column_1, partition_column_2))
ORDER BY size_gb DESC