cancel
Showing results for 
Search instead for 
Did you mean: 
Warehousing & Analytics
cancel
Showing results for 
Search instead for 
Did you mean: 

Calculate the total size in bytes for a column

peterlandis
New Contributor II

I wanted to calculate the total size in bytes for a given column for a table.  I saw that you can use the bit_length function and did something like this giving you the total bits of the column but not sure if this is correct.

SELECT sum(bit_length(to_binary(content, 'UTF-8'))) as total_bites FROM mytable;
 
When I look at running the DESCRIBE, the table sizeInBytes is way less then the above.  Is that because the size in table is actually compressed vs the bit_length is calculating without compression?
DESCRIBE DETAIL mytable;
1 REPLY 1

-werners-
Esteemed Contributor III

I looked at the docs of bit_length and it does not state if it is before or after compression.
However since spark decompresses data on read, it is very likely it is the size before compression.
The table size is read from metadata and is compressed.
To be 100% sure, you can try with a file without compression.