cancel
Showing results for 
Search instead for 
Did you mean: 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results for 
Search instead for 
Did you mean: 

Calculate the total size in bytes for a column

peterlandis
New Contributor II

I wanted to calculate the total size in bytes for a given column for a table.  I saw that you can use the bit_length function and did something like this giving you the total bits of the column but not sure if this is correct.

SELECT sum(bit_length(to_binary(content, 'UTF-8'))) as total_bites FROM mytable;
 
When I look at running the DESCRIBE, the table sizeInBytes is way less then the above.  Is that because the size in table is actually compressed vs the bit_length is calculating without compression?
DESCRIBE DETAIL mytable;
1 REPLY 1

-werners-
Esteemed Contributor III

I looked at the docs of bit_length and it does not state if it is before or after compression.
However since spark decompresses data on read, it is very likely it is the size before compression.
The table size is read from metadata and is compressed.
To be 100% sure, you can try with a file without compression.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group