<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Delta Table - Reduce time travel storage size in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/delta-table-reduce-time-travel-storage-size/m-p/27980#M19818</link>
    <description>&lt;P&gt;Thank you @Werner Stinckens​&amp;nbsp;for your reply. However I still haven't managed to delete history even after setting the below. The number of history rows remains the same when running "DESCRIBE HISTORY".&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;SET spark.databricks.delta.retentionDurationCheck.enabled = false&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;What I am actually trying to do is removing older history records from delta table history. Moreover is there a minimum and maximum retention period you can have with time travel?&lt;/P&gt;</description>
    <pubDate>Wed, 12 Oct 2022 07:35:57 GMT</pubDate>
    <dc:creator>elgeo</dc:creator>
    <dc:date>2022-10-12T07:35:57Z</dc:date>
    <item>
      <title>Delta Table - Reduce time travel storage size</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-table-reduce-time-travel-storage-size/m-p/27978#M19816</link>
      <description>&lt;P&gt;Hello! I am trying to understand time travel feature. I see with "DESCRIBE HISTORY" command that all the transaction history on a specific table is recorded by version and timestamp. However, I understand that this occupies a lot of storage especially if a table is updated every day. Is there a way to remove history or reduce the retention period? What is the minimun and maximum retention period you can have with time travel? I tried the below commands but "DESCRIBE HISTORY"  didn't bring different results.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;ALTER TABLE table_name&lt;/P&gt;&lt;P&gt;SET TBLPROPERTIES ('delta.logRetentionDuration'='interval 1 HOURS', 'delta.deletedFileRetentionDuration'='interval 1 HOURS')&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;VACUUM table_name &lt;/P&gt;</description>
      <pubDate>Tue, 11 Oct 2022 12:39:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-table-reduce-time-travel-storage-size/m-p/27978#M19816</guid>
      <dc:creator>elgeo</dc:creator>
      <dc:date>2022-10-11T12:39:06Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Table - Reduce time travel storage size</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-table-reduce-time-travel-storage-size/m-p/27979#M19817</link>
      <description>&lt;P&gt;You are almost there.  From the help page:&lt;/P&gt;&lt;P&gt;&lt;I&gt;Delta Lake has a safety check to prevent you from running a dangerous&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;VACUUM command. If you are certain that there are no operations being performed on this table that take longer than the retention interval you plan to specify, you can turn off this safety check by setting the Spark configuration property&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;spark.databricks.delta.retentionDurationCheck.enabled to false.&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Also:&lt;/P&gt;&lt;P&gt;&lt;I&gt;It is recommended that you set a retention interval to be at least 7 days, because old snapshots and uncommitted files can still be in use by concurrent readers or writers to the table. If&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;VACUUM cleans up active files, concurrent readers can fail or, worse, tables can be corrupted when&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;VACUUM deletes files that have not yet been committed. You must choose an interval that is longer than the longest running concurrent transaction and the longest period that any stream can lag behind the most recent update to the table.&lt;/I&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 11 Oct 2022 13:02:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-table-reduce-time-travel-storage-size/m-p/27979#M19817</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-10-11T13:02:22Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Table - Reduce time travel storage size</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-table-reduce-time-travel-storage-size/m-p/27980#M19818</link>
      <description>&lt;P&gt;Thank you @Werner Stinckens​&amp;nbsp;for your reply. However I still haven't managed to delete history even after setting the below. The number of history rows remains the same when running "DESCRIBE HISTORY".&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;SET spark.databricks.delta.retentionDurationCheck.enabled = false&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;What I am actually trying to do is removing older history records from delta table history. Moreover is there a minimum and maximum retention period you can have with time travel?&lt;/P&gt;</description>
      <pubDate>Wed, 12 Oct 2022 07:35:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-table-reduce-time-travel-storage-size/m-p/27980#M19818</guid>
      <dc:creator>elgeo</dc:creator>
      <dc:date>2022-10-12T07:35:57Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Table - Reduce time travel storage size</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-table-reduce-time-travel-storage-size/m-p/27981#M19819</link>
      <description>&lt;P&gt;you will have to define the retention interval when doing the vacuum.&lt;/P&gt;&lt;P&gt;VACUUM table_name [RETAIN num HOURS]&lt;/P&gt;&lt;P&gt;There is also a dry run option.&lt;/P&gt;&lt;P&gt;You can go up to 0 hours. Like that all history is deleted. A maximum value I do not know, for sure 30 days is possible, never tested with more than that.&lt;/P&gt;</description>
      <pubDate>Wed, 12 Oct 2022 07:47:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-table-reduce-time-travel-storage-size/m-p/27981#M19819</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-10-12T07:47:08Z</dc:date>
    </item>
  </channel>
</rss>

