<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Calculate the total size in bytes for a column in Warehousing &amp; Analytics</title>
    <link>https://community.databricks.com/t5/warehousing-analytics/calculate-the-total-size-in-bytes-for-a-column/m-p/39451#M846</link>
    <description>&lt;P&gt;I looked at the docs of bit_length and it does not state if it is before or after compression.&lt;BR /&gt;However since spark decompresses data on read, it is very likely it is the size before compression.&lt;BR /&gt;The table size is read from metadata and is compressed.&lt;BR /&gt;To be 100% sure, you can try with a file without compression.&lt;/P&gt;</description>
    <pubDate>Wed, 09 Aug 2023 14:59:26 GMT</pubDate>
    <dc:creator>-werners-</dc:creator>
    <dc:date>2023-08-09T14:59:26Z</dc:date>
    <item>
      <title>Calculate the total size in bytes for a column</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/calculate-the-total-size-in-bytes-for-a-column/m-p/39449#M845</link>
      <description>&lt;P&gt;I wanted to calculate the total size in bytes for a given column for a table.&amp;nbsp; I saw that you can use the bit_length function and did something like this giving you the total bits of the column but not sure if this is correct.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;SELECT&lt;/SPAN&gt; &lt;SPAN&gt;sum&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;bit_length&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;to_binary&lt;/SPAN&gt;&lt;SPAN&gt;(content, &lt;/SPAN&gt;&lt;SPAN&gt;'UTF-8'&lt;/SPAN&gt;&lt;SPAN&gt;))) &lt;/SPAN&gt;&lt;SPAN&gt;as&lt;/SPAN&gt;&lt;SPAN&gt; total_bites &lt;/SPAN&gt;&lt;SPAN&gt;FROM&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;mytable;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;When I look at running the DESCRIBE, the table sizeInBytes is way less then the above.&amp;nbsp; Is that because the size in table is actually compressed vs the bit_length is calculating without compression?&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;DESCRIBE&lt;/SPAN&gt; &lt;SPAN&gt;DETAIL&lt;/SPAN&gt;&lt;SPAN&gt; mytable;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 09 Aug 2023 14:38:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/calculate-the-total-size-in-bytes-for-a-column/m-p/39449#M845</guid>
      <dc:creator>peterlandis</dc:creator>
      <dc:date>2023-08-09T14:38:15Z</dc:date>
    </item>
    <item>
      <title>Re: Calculate the total size in bytes for a column</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/calculate-the-total-size-in-bytes-for-a-column/m-p/39451#M846</link>
      <description>&lt;P&gt;I looked at the docs of bit_length and it does not state if it is before or after compression.&lt;BR /&gt;However since spark decompresses data on read, it is very likely it is the size before compression.&lt;BR /&gt;The table size is read from metadata and is compressed.&lt;BR /&gt;To be 100% sure, you can try with a file without compression.&lt;/P&gt;</description>
      <pubDate>Wed, 09 Aug 2023 14:59:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/calculate-the-total-size-in-bytes-for-a-column/m-p/39451#M846</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2023-08-09T14:59:26Z</dc:date>
    </item>
  </channel>
</rss>

