<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Retention window from DLT created Delta tables in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/retention-window-from-dlt-created-delta-tables/m-p/99076#M39900</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/89478"&gt;@MuthuLakshmi&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for sharing the configurations. Here is a bit more clarity on our current workflow.&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&lt;FONT color="#FF0000"&gt;&lt;FONT color="#000000"&gt;&lt;STRONG&gt;DELETE and VACUUM Workflow&lt;/STRONG&gt;&lt;BR /&gt;Our workflow involves the following:&lt;BR /&gt;1. &lt;STRONG&gt;DELETE Operation&lt;/STRONG&gt;:&lt;BR /&gt;We delete records matching a specific predicate to mark their corresponding files for deletion. For example:&lt;BR /&gt;&lt;BR /&gt;&lt;FONT size="4"&gt;DELETE FROM &amp;lt;table_name&amp;gt; WHERE LogSubmissionDate &amp;lt; current_date() - INTERVAL 14 DAYS; &amp;nbsp;&lt;/FONT&gt;&lt;BR /&gt;&lt;A title="delta table delete, updates and merge" href="https://docs.delta.io/latest/delta-update.html#delete-from-a-table" target="_blank" rel="noopener"&gt;https://docs.delta.io/latest/delta-update.html#delete-from-a-table&lt;/A&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&lt;FONT color="#FF0000"&gt;&lt;FONT color="#000000"&gt;&lt;BR /&gt;2. &lt;STRONG&gt;VACUUM Operation&lt;/STRONG&gt;:&lt;BR /&gt;After the &lt;STRONG&gt;DELETE&lt;/STRONG&gt; operation, we run &lt;STRONG&gt;VACUUM&lt;/STRONG&gt; to permanently remove the files marked as deleted, including any associated data in the change_data folder:&lt;BR /&gt;&lt;BR /&gt;&lt;FONT size="4"&gt;VACUUM &amp;lt;table_name&amp;gt; RETAIN 0 HOURS;&lt;/FONT&gt; &amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;Understanding&lt;/STRONG&gt;&lt;BR /&gt;&lt;BR /&gt;• If no DELETE or UPDATE operation has occurred, running VACUUM or setting any of the configurations (delta.deletedFileRetentionDuration or delta.logRetentionDuration) will have &lt;STRONG&gt;no effect&lt;/STRONG&gt;.&lt;BR /&gt;• This is because VACUUM only acts on files that are no longer referenced in the transaction log, which requires a prior operation (like DELETE or UPDATE) to mark files for deletion.&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;Question&lt;/STRONG&gt;&lt;BR /&gt;&lt;BR /&gt;Given this understanding, if we solely rely on the configurations&amp;nbsp;delta.deletedFileRetentionDuration or delta.logRetentionDuration without explicitly performing a &lt;STRONG&gt;DELETE&lt;/STRONG&gt; operation, would this achieve the same result as out current DELETE-VACUUM workflow in terms of cleaning up old data?&lt;/FONT&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&lt;FONT color="#FF0000"&gt;&lt;FONT color="#000000"&gt;&lt;BR /&gt;Thank you for your insights!&lt;/FONT&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 18 Nov 2024 06:57:47 GMT</pubDate>
    <dc:creator>TinasheChinyati</dc:creator>
    <dc:date>2024-11-18T06:57:47Z</dc:date>
    <item>
      <title>Retention window from DLT created Delta tables</title>
      <link>https://community.databricks.com/t5/data-engineering/retention-window-from-dlt-created-delta-tables/m-p/98752#M39833</link>
      <description>&lt;P&gt;Hi guys&lt;/P&gt;&lt;P&gt;I am working with data ingested from Azure EventHub using Delta Live Tables in databricks. Our data architecture includes the medallion approach. Our current requirement is to retain only the most recent 14 days of data in the silver layer. To meet this retention requirement, I am considering using &lt;STRONG&gt;DELETE&lt;/STRONG&gt; command to remove records older than 14 days based on a specific date field, followed by a &lt;STRONG&gt;VACUUM&lt;/STRONG&gt; operation to reclaim storage space. This process would be scheduled to run daily.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Could you confirm if this approach is optimal for maintaining a 14-day retention window in the silver layer? Additionally would this method effectively manage storage costs while preserving Delta Live Tables' integrity for incremental processing? Since the files will still exist in the external storage account, I can then Life Cycle Management Policy to delete that data?&lt;/P&gt;&lt;P&gt;Thank you for your guidance on best practices for this setup.&lt;/P&gt;</description>
      <pubDate>Thu, 14 Nov 2024 08:54:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/retention-window-from-dlt-created-delta-tables/m-p/98752#M39833</guid>
      <dc:creator>TinasheChinyati</dc:creator>
      <dc:date>2024-11-14T08:54:48Z</dc:date>
    </item>
    <item>
      <title>Re: Retention window from DLT created Delta tables</title>
      <link>https://community.databricks.com/t5/data-engineering/retention-window-from-dlt-created-delta-tables/m-p/98810#M39848</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/7091"&gt;@TinasheChinyati&lt;/a&gt;&amp;nbsp;In-order to retain the 14 days history of a table you have tune the below parameters.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;To query a previous table version, you must retain&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;EM&gt;both&lt;/EM&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;the log and the data files for that version.&lt;/P&gt;
&lt;P&gt;Data files are deleted when&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE class="docutils literal notranslate"&gt;&lt;SPAN class="pre"&gt;VACUUM&lt;/SPAN&gt;&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;runs against a table. Delta Lake manages log file removal automatically after checkpointing table versions.&lt;/P&gt;
&lt;P&gt;Because most Delta tables have&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE class="docutils literal notranslate"&gt;&lt;SPAN class="pre"&gt;VACUUM&lt;/SPAN&gt;&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;run against them regularly, point-in-time queries should respect the retention threshold for&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE class="docutils literal notranslate"&gt;&lt;SPAN class="pre"&gt;VACUUM&lt;/SPAN&gt;&lt;/CODE&gt;, which is 7 days by default.&lt;/P&gt;
&lt;P&gt;In order to increase the data retention threshold for Delta tables, you must configure the following table properties:&lt;/P&gt;
&lt;UL class="simple"&gt;
&lt;LI&gt;
&lt;P&gt;&lt;CODE class="docutils literal notranslate"&gt;&lt;SPAN class="pre"&gt;delta.logRetentionDuration&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="pre"&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="pre"&gt;"interval&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="pre"&gt;&amp;lt;interval&amp;gt;"&lt;/SPAN&gt;&lt;/CODE&gt;: controls how long the history for a table is kept. The default is&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE class="docutils literal notranslate"&gt;&lt;SPAN class="pre"&gt;interval&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="pre"&gt;30&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="pre"&gt;days&lt;/SPAN&gt;&lt;/CODE&gt;.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;&lt;CODE class="docutils literal notranslate"&gt;&lt;SPAN class="pre"&gt;delta.deletedFileRetentionDuration&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="pre"&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="pre"&gt;"interval&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="pre"&gt;&amp;lt;interval&amp;gt;"&lt;/SPAN&gt;&lt;/CODE&gt;: determines the threshold&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE class="docutils literal notranslate"&gt;&lt;SPAN class="pre"&gt;VACUUM&lt;/SPAN&gt;&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;uses to remove data files no longer referenced in the current table version. The default is&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE class="docutils literal notranslate"&gt;&lt;SPAN class="pre"&gt;interval&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="pre"&gt;7&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="pre"&gt;days&lt;/SPAN&gt;&lt;/CODE&gt;.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;SPAN&gt;You must set both of these properties to ensure table history is retained for longer duration for tables with frequent&amp;nbsp;&lt;/SPAN&gt;&lt;CODE class="docutils literal notranslate"&gt;&lt;SPAN class="pre"&gt;VACUUM&lt;/SPAN&gt;&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;operations. For example, to access 30 days of historical data, set&amp;nbsp;&lt;/SPAN&gt;&lt;CODE class="docutils literal notranslate"&gt;&lt;SPAN class="pre"&gt;delta.deletedFileRetentionDuration&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="pre"&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="pre"&gt;"interval&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="pre"&gt;30&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="pre"&gt;days"&lt;/SPAN&gt;&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;(which matches the default setting for&amp;nbsp;&lt;/SPAN&gt;&lt;CODE class="docutils literal notranslate"&gt;&lt;SPAN class="pre"&gt;delta.logRetentionDuration&lt;/SPAN&gt;&lt;/CODE&gt;&lt;SPAN&gt;).&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;In your case to retain 14 days records, keep the delta.deletedFileRetentionDuration = "interval 14 days"&lt;BR /&gt;&lt;BR /&gt;&lt;A href="https://docs.databricks.com/en/delta/history.html" target="_blank"&gt;https://docs.databricks.com/en/delta/history.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 14 Nov 2024 13:47:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/retention-window-from-dlt-created-delta-tables/m-p/98810#M39848</guid>
      <dc:creator>MuthuLakshmi</dc:creator>
      <dc:date>2024-11-14T13:47:11Z</dc:date>
    </item>
    <item>
      <title>Re: Retention window from DLT created Delta tables</title>
      <link>https://community.databricks.com/t5/data-engineering/retention-window-from-dlt-created-delta-tables/m-p/99076#M39900</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/89478"&gt;@MuthuLakshmi&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for sharing the configurations. Here is a bit more clarity on our current workflow.&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&lt;FONT color="#FF0000"&gt;&lt;FONT color="#000000"&gt;&lt;STRONG&gt;DELETE and VACUUM Workflow&lt;/STRONG&gt;&lt;BR /&gt;Our workflow involves the following:&lt;BR /&gt;1. &lt;STRONG&gt;DELETE Operation&lt;/STRONG&gt;:&lt;BR /&gt;We delete records matching a specific predicate to mark their corresponding files for deletion. For example:&lt;BR /&gt;&lt;BR /&gt;&lt;FONT size="4"&gt;DELETE FROM &amp;lt;table_name&amp;gt; WHERE LogSubmissionDate &amp;lt; current_date() - INTERVAL 14 DAYS; &amp;nbsp;&lt;/FONT&gt;&lt;BR /&gt;&lt;A title="delta table delete, updates and merge" href="https://docs.delta.io/latest/delta-update.html#delete-from-a-table" target="_blank" rel="noopener"&gt;https://docs.delta.io/latest/delta-update.html#delete-from-a-table&lt;/A&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&lt;FONT color="#FF0000"&gt;&lt;FONT color="#000000"&gt;&lt;BR /&gt;2. &lt;STRONG&gt;VACUUM Operation&lt;/STRONG&gt;:&lt;BR /&gt;After the &lt;STRONG&gt;DELETE&lt;/STRONG&gt; operation, we run &lt;STRONG&gt;VACUUM&lt;/STRONG&gt; to permanently remove the files marked as deleted, including any associated data in the change_data folder:&lt;BR /&gt;&lt;BR /&gt;&lt;FONT size="4"&gt;VACUUM &amp;lt;table_name&amp;gt; RETAIN 0 HOURS;&lt;/FONT&gt; &amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;Understanding&lt;/STRONG&gt;&lt;BR /&gt;&lt;BR /&gt;• If no DELETE or UPDATE operation has occurred, running VACUUM or setting any of the configurations (delta.deletedFileRetentionDuration or delta.logRetentionDuration) will have &lt;STRONG&gt;no effect&lt;/STRONG&gt;.&lt;BR /&gt;• This is because VACUUM only acts on files that are no longer referenced in the transaction log, which requires a prior operation (like DELETE or UPDATE) to mark files for deletion.&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;Question&lt;/STRONG&gt;&lt;BR /&gt;&lt;BR /&gt;Given this understanding, if we solely rely on the configurations&amp;nbsp;delta.deletedFileRetentionDuration or delta.logRetentionDuration without explicitly performing a &lt;STRONG&gt;DELETE&lt;/STRONG&gt; operation, would this achieve the same result as out current DELETE-VACUUM workflow in terms of cleaning up old data?&lt;/FONT&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&lt;FONT color="#FF0000"&gt;&lt;FONT color="#000000"&gt;&lt;BR /&gt;Thank you for your insights!&lt;/FONT&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 18 Nov 2024 06:57:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/retention-window-from-dlt-created-delta-tables/m-p/99076#M39900</guid>
      <dc:creator>TinasheChinyati</dc:creator>
      <dc:date>2024-11-18T06:57:47Z</dc:date>
    </item>
    <item>
      <title>Re: Retention window from DLT created Delta tables</title>
      <link>https://community.databricks.com/t5/data-engineering/retention-window-from-dlt-created-delta-tables/m-p/99084#M39903</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/7091"&gt;@TinasheChinyati&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Based on your inputs my understanding is that DELETE is for deleting old records from the delta table (history data).&amp;nbsp;&lt;BR /&gt;If that's the case, &lt;STRONG&gt;DELETE&lt;/STRONG&gt; followed by &lt;STRONG&gt;VACUUM&lt;/STRONG&gt; with LogRetention and DeletedFileRetentionDuration will remove the underlying data files.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 18 Nov 2024 08:47:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/retention-window-from-dlt-created-delta-tables/m-p/99084#M39903</guid>
      <dc:creator>MuthuLakshmi</dc:creator>
      <dc:date>2024-11-18T08:47:23Z</dc:date>
    </item>
  </channel>
</rss>

