<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How do I reduce the size of a hive table's S3 bucket in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-do-i-reduce-the-size-of-a-hive-table-s-s3-bucket/m-p/8198#M3906</link>
    <description>&lt;P&gt;When you run updates, deletes etc on a delta table, new files are created. However, the old files are not automatically deleted. This is to allow for features like time travel on the Delta tables. &lt;/P&gt;&lt;P&gt;In order to delete older files for a delta table, you can use the vacuum command. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/sql/language-manual/delta-vacuum.html" alt="https://docs.databricks.com/sql/language-manual/delta-vacuum.html" target="_blank"&gt;https://docs.databricks.com/sql/language-manual/delta-vacuum.html&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 06 Mar 2023 21:52:20 GMT</pubDate>
    <dc:creator>apingle</dc:creator>
    <dc:date>2023-03-06T21:52:20Z</dc:date>
    <item>
      <title>How do I reduce the size of a hive table's S3 bucket</title>
      <link>https://community.databricks.com/t5/data-engineering/how-do-i-reduce-the-size-of-a-hive-table-s-s3-bucket/m-p/8197#M3905</link>
      <description>&lt;P&gt;I have a hive table in Delta format with over 1B rows, when I check the Data Explorer in the SQL section of Databricks it notes that the table size is 139.3GiB with 401 files but when I check the S3 bucket where the files are located (dbfs:/user/hive/warehouse/large_table) it's over 110TB and contains over 100K files.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Is it possible to reduce the size of the S3 bucket without losing any data in the table?&lt;/P&gt;</description>
      <pubDate>Mon, 06 Mar 2023 16:27:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-do-i-reduce-the-size-of-a-hive-table-s-s3-bucket/m-p/8197#M3905</guid>
      <dc:creator>dotan</dc:creator>
      <dc:date>2023-03-06T16:27:58Z</dc:date>
    </item>
    <item>
      <title>Re: How do I reduce the size of a hive table's S3 bucket</title>
      <link>https://community.databricks.com/t5/data-engineering/how-do-i-reduce-the-size-of-a-hive-table-s-s3-bucket/m-p/8198#M3906</link>
      <description>&lt;P&gt;When you run updates, deletes etc on a delta table, new files are created. However, the old files are not automatically deleted. This is to allow for features like time travel on the Delta tables. &lt;/P&gt;&lt;P&gt;In order to delete older files for a delta table, you can use the vacuum command. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/sql/language-manual/delta-vacuum.html" alt="https://docs.databricks.com/sql/language-manual/delta-vacuum.html" target="_blank"&gt;https://docs.databricks.com/sql/language-manual/delta-vacuum.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 06 Mar 2023 21:52:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-do-i-reduce-the-size-of-a-hive-table-s-s3-bucket/m-p/8198#M3906</guid>
      <dc:creator>apingle</dc:creator>
      <dc:date>2023-03-06T21:52:20Z</dc:date>
    </item>
    <item>
      <title>Re: How do I reduce the size of a hive table's S3 bucket</title>
      <link>https://community.databricks.com/t5/data-engineering/how-do-i-reduce-the-size-of-a-hive-table-s-s3-bucket/m-p/8199#M3907</link>
      <description>&lt;P&gt;That's great, thanks. It reduced the size of the bucket from 110TB to 7TB&lt;/P&gt;</description>
      <pubDate>Tue, 07 Mar 2023 19:47:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-do-i-reduce-the-size-of-a-hive-table-s-s3-bucket/m-p/8199#M3907</guid>
      <dc:creator>dotan</dc:creator>
      <dc:date>2023-03-07T19:47:07Z</dc:date>
    </item>
  </channel>
</rss>

