<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: The functionality of table property delta.logRetentionDuration in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/the-functionality-of-table-property-delta-logretentionduration/m-p/20369#M13740</link>
    <description>&lt;P&gt;Hi @Priyanka Mane​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Quick notes&lt;/B&gt;:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You need &lt;B&gt;both&lt;/B&gt; the log and data files to time-travel to a previous version. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Vacuum&lt;/B&gt; - does not delete the log files. It only deletes the data files, which are never deleted automatically unless you run the vacuum. Log files are automatically cleaned up after new checkpoints are added.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;logRetentionDuration&lt;/B&gt; - Each time a checkpoint is written, Databricks automatically cleans up log entries older than the retention interval. In your case, &lt;B&gt;when a new checkpoint is written, it clears the logs older than 2 days&lt;/B&gt;. Once this happens, you should not be able to do time travel as log files are now unavailable for that version.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;And to delete the data files associated with the logs, you have to run a vacuum, as there is no other way to delete the data. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;logRetentionDuration takes any calendar interval like x days, x weeks etc. Months and years are not accepted.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;And finally, all these would help only when you are doing a new transaction, so there is a new checkpoint for logretentionduration.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I hope these details help.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Cheers.&lt;/P&gt;</description>
    <pubDate>Mon, 28 Nov 2022 06:48:47 GMT</pubDate>
    <dc:creator>UmaMahesh1</dc:creator>
    <dc:date>2022-11-28T06:48:47Z</dc:date>
    <item>
      <title>The functionality of table property delta.logRetentionDuration</title>
      <link>https://community.databricks.com/t5/data-engineering/the-functionality-of-table-property-delta-logretentionduration/m-p/20368#M13739</link>
      <description>&lt;P&gt;We have one project requirement where we have to store only the 14 days history for delta tables. So for testing, I have set the delta.logRetentionDuration&amp;nbsp;= 2 days using the below command&lt;/P&gt;&lt;P&gt;spark.sql("alter table delta.`[delta_file_path]` set TBLPROPERTIES (’delta.logRetentionDuration'='interval 2 days’)")&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;However, I tried it after specific intervals, i.e., (after two days) I can still time travel back to previous versions. Do we need to run Vacuum after setting this property, or it works only for &amp;gt;30 days?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Can I please get help on this?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Also, will it physically delete the data files or will only log files be deleted?&lt;/P&gt;</description>
      <pubDate>Mon, 28 Nov 2022 05:01:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/the-functionality-of-table-property-delta-logretentionduration/m-p/20368#M13739</guid>
      <dc:creator>Priyanka48</dc:creator>
      <dc:date>2022-11-28T05:01:27Z</dc:date>
    </item>
    <item>
      <title>Re: The functionality of table property delta.logRetentionDuration</title>
      <link>https://community.databricks.com/t5/data-engineering/the-functionality-of-table-property-delta-logretentionduration/m-p/20369#M13740</link>
      <description>&lt;P&gt;Hi @Priyanka Mane​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Quick notes&lt;/B&gt;:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You need &lt;B&gt;both&lt;/B&gt; the log and data files to time-travel to a previous version. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Vacuum&lt;/B&gt; - does not delete the log files. It only deletes the data files, which are never deleted automatically unless you run the vacuum. Log files are automatically cleaned up after new checkpoints are added.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;logRetentionDuration&lt;/B&gt; - Each time a checkpoint is written, Databricks automatically cleans up log entries older than the retention interval. In your case, &lt;B&gt;when a new checkpoint is written, it clears the logs older than 2 days&lt;/B&gt;. Once this happens, you should not be able to do time travel as log files are now unavailable for that version.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;And to delete the data files associated with the logs, you have to run a vacuum, as there is no other way to delete the data. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;logRetentionDuration takes any calendar interval like x days, x weeks etc. Months and years are not accepted.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;And finally, all these would help only when you are doing a new transaction, so there is a new checkpoint for logretentionduration.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I hope these details help.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Cheers.&lt;/P&gt;</description>
      <pubDate>Mon, 28 Nov 2022 06:48:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/the-functionality-of-table-property-delta-logretentionduration/m-p/20369#M13740</guid>
      <dc:creator>UmaMahesh1</dc:creator>
      <dc:date>2022-11-28T06:48:47Z</dc:date>
    </item>
    <item>
      <title>Re: The functionality of table property delta.logRetentionDuration</title>
      <link>https://community.databricks.com/t5/data-engineering/the-functionality-of-table-property-delta-logretentionduration/m-p/20370#M13741</link>
      <description>&lt;P&gt;Adding some blogs for your reading..&lt;/P&gt;&lt;P&gt;&lt;A href="https://mungingdata.com/delta-lake/vacuum-command/" target="test_blank"&gt;https://mungingdata.com/delta-lake/vacuum-command/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;youtube.com/watch?v=F91G4RoA8is&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/delta/history.html" target="test_blank"&gt;https://docs.databricks.com/delta/history.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 28 Nov 2022 06:49:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/the-functionality-of-table-property-delta-logretentionduration/m-p/20370#M13741</guid>
      <dc:creator>UmaMahesh1</dc:creator>
      <dc:date>2022-11-28T06:49:59Z</dc:date>
    </item>
    <item>
      <title>Re: The functionality of table property delta.logRetentionDuration</title>
      <link>https://community.databricks.com/t5/data-engineering/the-functionality-of-table-property-delta-logretentionduration/m-p/20371#M13742</link>
      <description>&lt;P&gt;Hi, by default there is a safety interval enabled.  So if you set a retentionperiod lower than that interval (7 days), data in that interval will not be deleted.&lt;/P&gt;&lt;P&gt;You have to specificall override this safety interval by setting &lt;/P&gt;&lt;P&gt;spark.databricks.delta.retentionDurationCheck.enabled to false.&lt;/P&gt;&lt;P&gt;Then vacuum and the data will be gone.&lt;/P&gt;</description>
      <pubDate>Mon, 28 Nov 2022 08:49:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/the-functionality-of-table-property-delta-logretentionduration/m-p/20371#M13742</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-11-28T08:49:10Z</dc:date>
    </item>
    <item>
      <title>Re: The functionality of table property delta.logRetentionDuration</title>
      <link>https://community.databricks.com/t5/data-engineering/the-functionality-of-table-property-delta-logretentionduration/m-p/20372#M13743</link>
      <description>&lt;P&gt;Thanks for the suggestion. I have set log retention duration for 2 days and I am performing a transaction on it after 2 days. It has not deleted older logs and I can time travel back to previous versions&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 28 Nov 2022 14:06:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/the-functionality-of-table-property-delta-logretentionduration/m-p/20372#M13743</guid>
      <dc:creator>Priyanka48</dc:creator>
      <dc:date>2022-11-28T14:06:50Z</dc:date>
    </item>
  </channel>
</rss>

