<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Find the size of delta table for each month before partition in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/find-the-size-of-delta-table-for-each-month-before-partition/m-p/58408#M31129</link>
    <description>&lt;P&gt;I think the only option here is to run a count by the partition column. That will give you the no. of rows for each partition.&lt;/P&gt;</description>
    <pubDate>Thu, 25 Jan 2024 11:42:06 GMT</pubDate>
    <dc:creator>Lakshay</dc:creator>
    <dc:date>2024-01-25T11:42:06Z</dc:date>
    <item>
      <title>Find the size of delta table for each month before partition</title>
      <link>https://community.databricks.com/t5/data-engineering/find-the-size-of-delta-table-for-each-month-before-partition/m-p/58384#M31121</link>
      <description>&lt;P&gt;We have 38 delta tables. We decided to do partition the delta tables for each month.&lt;/P&gt;&lt;P&gt;But we have some small tables as well. So we need find the size of delta tables for each month. So that we can use either partition or Z-order&lt;/P&gt;&lt;P&gt;Is there a way to find the size of delta table for each month?&lt;/P&gt;</description>
      <pubDate>Thu, 25 Jan 2024 06:48:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/find-the-size-of-delta-table-for-each-month-before-partition/m-p/58384#M31121</guid>
      <dc:creator>chandraprakash</dc:creator>
      <dc:date>2024-01-25T06:48:07Z</dc:date>
    </item>
    <item>
      <title>Re: Find the size of delta table for each month before partition</title>
      <link>https://community.databricks.com/t5/data-engineering/find-the-size-of-delta-table-for-each-month-before-partition/m-p/58408#M31129</link>
      <description>&lt;P&gt;I think the only option here is to run a count by the partition column. That will give you the no. of rows for each partition.&lt;/P&gt;</description>
      <pubDate>Thu, 25 Jan 2024 11:42:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/find-the-size-of-delta-table-for-each-month-before-partition/m-p/58408#M31129</guid>
      <dc:creator>Lakshay</dc:creator>
      <dc:date>2024-01-25T11:42:06Z</dc:date>
    </item>
    <item>
      <title>Re: Find the size of delta table for each month before partition</title>
      <link>https://community.databricks.com/t5/data-engineering/find-the-size-of-delta-table-for-each-month-before-partition/m-p/58509#M31191</link>
      <description>&lt;P&gt;For your tables, I’m curious if you could utilize Liquid Clustering to reduce some of the maintenance issues relating to choosing Z-Order vs. partitioning. &amp;nbsp;&lt;/P&gt;
&lt;P&gt;Saying this, one potential way is to read the Delta transaction log and read the Add Info statistics which includes the file path and size. &amp;nbsp;&lt;/P&gt;
&lt;P&gt;You can query the transaction log add info directly to extract out all the files that are associated with a particular month to calculate its size. &amp;nbsp;&lt;/P&gt;
&lt;P&gt;You can also find more info about this at&amp;nbsp;&lt;A title="A peek into the Delta Lake Transaction Log" href="https://dennyglee.com/2024/01/03/a-peek-into-the-delta-lake-transaction-log/" target="_blank" rel="noopener"&gt;A peek into the Delta Lake Transaction Log&lt;/A&gt;. &amp;nbsp;&lt;/P&gt;
&lt;P&gt;HTH!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 27 Jan 2024 07:41:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/find-the-size-of-delta-table-for-each-month-before-partition/m-p/58509#M31191</guid>
      <dc:creator>dennyglee</dc:creator>
      <dc:date>2024-01-27T07:41:27Z</dc:date>
    </item>
  </channel>
</rss>

