<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Designing Reliable Data Versioning Strategies in Databricks in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/designing-reliable-data-versioning-strategies-in-databricks/m-p/154993#M54168</link>
    <description>&lt;P class="p8i6j01 paragraph"&gt;You’re hitting the new 7-day time-travel enforcement: in recent runtimes, time travel is blocked beyond &lt;CODE class="p8i6j0f"&gt;delta.deletedFileRetentionDuration&lt;/CODE&gt; (default 7 days), regardless of &lt;CODE class="p8i6j0f"&gt;VACUUM&lt;/CODE&gt; args or &lt;CODE class="p8i6j0f"&gt;logRetentionDuration&lt;/CODE&gt;.&lt;/P&gt;
&lt;P class="p8i6j01 paragraph"&gt;To support 30 days of historical data for reporting, you have two options:&lt;/P&gt;
&lt;H3 class="_9k2iva0 p8i6j0c _1ibi0s314 heading3 _9k2iva1"&gt;1. Use Delta retention settings for 30-day time travel&lt;/H3&gt;
&lt;P class="p8i6j01 paragraph"&gt;If 30 days is enough history and you’re okay paying for the extra storage, just configure the table’s Delta properties:&lt;/P&gt;
&lt;DIV class="l8rrz21 _1ibi0s3do" data-ui-element="code-block-container"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-sql p8i6j0e hljs language-sql _12n1b832"&gt;&lt;SPAN class="hljs-keyword"&gt;ALTER TABLE&lt;/SPAN&gt; catalog.schema.table_name
&lt;SPAN class="hljs-keyword"&gt;SET&lt;/SPAN&gt; TBLPROPERTIES (
  &lt;SPAN class="hljs-string"&gt;'delta.deletedFileRetentionDuration'&lt;/SPAN&gt; &lt;SPAN class="hljs-operator"&gt;=&lt;/SPAN&gt; &lt;SPAN class="hljs-string"&gt;'interval 30 days'&lt;/SPAN&gt;,
  &lt;SPAN class="hljs-string"&gt;'delta.logRetentionDuration'&lt;/SPAN&gt;        &lt;SPAN class="hljs-operator"&gt;=&lt;/SPAN&gt; &lt;SPAN class="hljs-string"&gt;'interval 30 days'&lt;/SPAN&gt;
);&lt;/CODE&gt;&lt;/PRE&gt;
&lt;DIV class="l8rrz23 _1ibi0s3d7 _1ibi0s332 _1ibi0s3dp _1ibi0s3bm _1ibi0s3ce"&gt;
&lt;DIV class="l8rrz25 _1ibi0s3dc"&gt;sql&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;UL class="p8i6j07 p8i6j02"&gt;
&lt;LI class="p8i6j0a"&gt;&lt;CODE class="p8i6j0f"&gt;delta.deletedFileRetentionDuration&lt;/CODE&gt; controls how long old data files are kept and therefore the maximum time-travel window (default 7 days).
&lt;DIV class="tk0j8o1 _1ibi0s31a _1ibi0s3do"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI class="p8i6j0a"&gt;&lt;CODE class="p8i6j0f"&gt;delta.logRetentionDuration&lt;/CODE&gt; controls how long the transaction log/history is kept (default 30 days) and must be &lt;STRONG&gt;≥ &lt;CODE class="p8i6j0f"&gt;deletedFileRetentionDuration&lt;/CODE&gt;&lt;/STRONG&gt; in new runtimes.
&lt;DIV class="tk0j8o1 _1ibi0s31a _1ibi0s3do"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI class="p8i6j0a"&gt;Docs explicitly note: to access 30 days of history, set &lt;CODE class="p8i6j0f"&gt;delta.deletedFileRetentionDuration = "interval 30 days"&lt;/CODE&gt; to match the default &lt;CODE class="p8i6j0f"&gt;logRetentionDuration&lt;/CODE&gt;.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="p8i6j01 paragraph"&gt;This is the simplest way to have a 30-day rolling history directly available via time travel for your trend reports.&lt;/P&gt;
&lt;H3 class="_9k2iva0 p8i6j0c _1ibi0s314 heading3 _9k2iva1"&gt;2. (Recommended for robust reporting) Build a dedicated history / trend table&lt;/H3&gt;
&lt;P class="p8i6j01 paragraph"&gt;Delta docs and guidance caution against relying on table history as a long-term backup or analytics store and recommend using history/time travel mainly for short-term ops and recovery.&lt;/P&gt;
&lt;P class="p8i6j01 paragraph"&gt;For reporting use cases (especially if you may want &amp;gt;30 days later), a common pattern is:&lt;/P&gt;
&lt;UL class="p8i6j07 p8i6j02"&gt;
&lt;LI class="p8i6j0a"&gt;Create a separate “history” or “snapshot” Delta table, partitioned by a logical date (e.g., &lt;CODE class="p8i6j0f"&gt;snapshot_date&lt;/CODE&gt; or business date).&lt;/LI&gt;
&lt;LI class="p8i6j0a"&gt;On a daily job (or whatever cadence you need):
&lt;UL class="p8i6j08 p8i6j02"&gt;
&lt;LI class="p8i6j0a"&gt;Either:
&lt;UL class="p8i6j09 p8i6j02"&gt;
&lt;LI class="p8i6j0a"&gt;Take a point-in-time snapshot of the source table and append it with that day’s &lt;CODE class="p8i6j0f"&gt;snapshot_date&lt;/CODE&gt;, or&lt;/LI&gt;
&lt;LI class="p8i6j0a"&gt;Use Change Data Feed (CDF) to upsert incremental changes into a slowly-changing dimension / fact table with effective dates.
&lt;DIV class="tk0j8o1 _1ibi0s31a _1ibi0s3do"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI class="p8i6j0a"&gt;Point your BI / trend reports at this &lt;STRONG&gt;reporting table&lt;/STRONG&gt;, not at time-travel queries.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="p8i6j01 paragraph"&gt;This pattern keeps reporting predictable and decoupled from operational retention and &lt;CODE class="p8i6j0f"&gt;VACUUM&lt;/CODE&gt; settings, while you still keep ~30 days of native time travel for debugging and recovery.&lt;/P&gt;</description>
    <pubDate>Mon, 20 Apr 2026 20:47:23 GMT</pubDate>
    <dc:creator>DivyaandData</dc:creator>
    <dc:date>2026-04-20T20:47:23Z</dc:date>
    <item>
      <title>Designing Reliable Data Versioning Strategies in Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/designing-reliable-data-versioning-strategies-in-databricks/m-p/154987#M54166</link>
      <description>&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;I’m working on a use case where I need to retain &lt;STRONG&gt;30 days of historical data&lt;/STRONG&gt; in a Delta table and use it to build trend reports.&lt;/P&gt;&lt;P&gt;I’m looking for the best approach to reliably maintain this historical data while also making it suitable for reporting. I explored the Delta Lake time travel feature, but it currently only allows access to about &lt;STRONG&gt;7 days of history&lt;/STRONG&gt;, which doesn’t meet my requirement.&lt;/P&gt;&lt;P&gt;What would be the recommended way to handle this scenario?&lt;/P&gt;&lt;P&gt;Thank you!&lt;/P&gt;</description>
      <pubDate>Mon, 20 Apr 2026 18:54:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/designing-reliable-data-versioning-strategies-in-databricks/m-p/154987#M54166</guid>
      <dc:creator>Raj_DB</dc:creator>
      <dc:date>2026-04-20T18:54:50Z</dc:date>
    </item>
    <item>
      <title>Re: Designing Reliable Data Versioning Strategies in Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/designing-reliable-data-versioning-strategies-in-databricks/m-p/154992#M54167</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/177179"&gt;@Raj_DB&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;For reliable 30-day trend reporting, don't rely on time travel — use an explicit snapshot table instead.&lt;BR /&gt;The pattern:&lt;BR /&gt;1. Daily job appends a snapshot of your source table to a history table with a snapshot_date partition column&lt;BR /&gt;2. Purge rows older than 30 days with a scheduled DELETE WHERE snapshot_date &amp;lt; current_date() - INTERVAL 30 DAYS&lt;BR /&gt;3. Run VACUUM after to reclaim storage.&lt;BR /&gt;This gives you full control over retention and keeps trend queries fast via partition pruning.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 20 Apr 2026 20:23:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/designing-reliable-data-versioning-strategies-in-databricks/m-p/154992#M54167</guid>
      <dc:creator>lingareddy_Alva</dc:creator>
      <dc:date>2026-04-20T20:23:06Z</dc:date>
    </item>
    <item>
      <title>Re: Designing Reliable Data Versioning Strategies in Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/designing-reliable-data-versioning-strategies-in-databricks/m-p/154993#M54168</link>
      <description>&lt;P class="p8i6j01 paragraph"&gt;You’re hitting the new 7-day time-travel enforcement: in recent runtimes, time travel is blocked beyond &lt;CODE class="p8i6j0f"&gt;delta.deletedFileRetentionDuration&lt;/CODE&gt; (default 7 days), regardless of &lt;CODE class="p8i6j0f"&gt;VACUUM&lt;/CODE&gt; args or &lt;CODE class="p8i6j0f"&gt;logRetentionDuration&lt;/CODE&gt;.&lt;/P&gt;
&lt;P class="p8i6j01 paragraph"&gt;To support 30 days of historical data for reporting, you have two options:&lt;/P&gt;
&lt;H3 class="_9k2iva0 p8i6j0c _1ibi0s314 heading3 _9k2iva1"&gt;1. Use Delta retention settings for 30-day time travel&lt;/H3&gt;
&lt;P class="p8i6j01 paragraph"&gt;If 30 days is enough history and you’re okay paying for the extra storage, just configure the table’s Delta properties:&lt;/P&gt;
&lt;DIV class="l8rrz21 _1ibi0s3do" data-ui-element="code-block-container"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-sql p8i6j0e hljs language-sql _12n1b832"&gt;&lt;SPAN class="hljs-keyword"&gt;ALTER TABLE&lt;/SPAN&gt; catalog.schema.table_name
&lt;SPAN class="hljs-keyword"&gt;SET&lt;/SPAN&gt; TBLPROPERTIES (
  &lt;SPAN class="hljs-string"&gt;'delta.deletedFileRetentionDuration'&lt;/SPAN&gt; &lt;SPAN class="hljs-operator"&gt;=&lt;/SPAN&gt; &lt;SPAN class="hljs-string"&gt;'interval 30 days'&lt;/SPAN&gt;,
  &lt;SPAN class="hljs-string"&gt;'delta.logRetentionDuration'&lt;/SPAN&gt;        &lt;SPAN class="hljs-operator"&gt;=&lt;/SPAN&gt; &lt;SPAN class="hljs-string"&gt;'interval 30 days'&lt;/SPAN&gt;
);&lt;/CODE&gt;&lt;/PRE&gt;
&lt;DIV class="l8rrz23 _1ibi0s3d7 _1ibi0s332 _1ibi0s3dp _1ibi0s3bm _1ibi0s3ce"&gt;
&lt;DIV class="l8rrz25 _1ibi0s3dc"&gt;sql&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;UL class="p8i6j07 p8i6j02"&gt;
&lt;LI class="p8i6j0a"&gt;&lt;CODE class="p8i6j0f"&gt;delta.deletedFileRetentionDuration&lt;/CODE&gt; controls how long old data files are kept and therefore the maximum time-travel window (default 7 days).
&lt;DIV class="tk0j8o1 _1ibi0s31a _1ibi0s3do"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI class="p8i6j0a"&gt;&lt;CODE class="p8i6j0f"&gt;delta.logRetentionDuration&lt;/CODE&gt; controls how long the transaction log/history is kept (default 30 days) and must be &lt;STRONG&gt;≥ &lt;CODE class="p8i6j0f"&gt;deletedFileRetentionDuration&lt;/CODE&gt;&lt;/STRONG&gt; in new runtimes.
&lt;DIV class="tk0j8o1 _1ibi0s31a _1ibi0s3do"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI class="p8i6j0a"&gt;Docs explicitly note: to access 30 days of history, set &lt;CODE class="p8i6j0f"&gt;delta.deletedFileRetentionDuration = "interval 30 days"&lt;/CODE&gt; to match the default &lt;CODE class="p8i6j0f"&gt;logRetentionDuration&lt;/CODE&gt;.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="p8i6j01 paragraph"&gt;This is the simplest way to have a 30-day rolling history directly available via time travel for your trend reports.&lt;/P&gt;
&lt;H3 class="_9k2iva0 p8i6j0c _1ibi0s314 heading3 _9k2iva1"&gt;2. (Recommended for robust reporting) Build a dedicated history / trend table&lt;/H3&gt;
&lt;P class="p8i6j01 paragraph"&gt;Delta docs and guidance caution against relying on table history as a long-term backup or analytics store and recommend using history/time travel mainly for short-term ops and recovery.&lt;/P&gt;
&lt;P class="p8i6j01 paragraph"&gt;For reporting use cases (especially if you may want &amp;gt;30 days later), a common pattern is:&lt;/P&gt;
&lt;UL class="p8i6j07 p8i6j02"&gt;
&lt;LI class="p8i6j0a"&gt;Create a separate “history” or “snapshot” Delta table, partitioned by a logical date (e.g., &lt;CODE class="p8i6j0f"&gt;snapshot_date&lt;/CODE&gt; or business date).&lt;/LI&gt;
&lt;LI class="p8i6j0a"&gt;On a daily job (or whatever cadence you need):
&lt;UL class="p8i6j08 p8i6j02"&gt;
&lt;LI class="p8i6j0a"&gt;Either:
&lt;UL class="p8i6j09 p8i6j02"&gt;
&lt;LI class="p8i6j0a"&gt;Take a point-in-time snapshot of the source table and append it with that day’s &lt;CODE class="p8i6j0f"&gt;snapshot_date&lt;/CODE&gt;, or&lt;/LI&gt;
&lt;LI class="p8i6j0a"&gt;Use Change Data Feed (CDF) to upsert incremental changes into a slowly-changing dimension / fact table with effective dates.
&lt;DIV class="tk0j8o1 _1ibi0s31a _1ibi0s3do"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI class="p8i6j0a"&gt;Point your BI / trend reports at this &lt;STRONG&gt;reporting table&lt;/STRONG&gt;, not at time-travel queries.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="p8i6j01 paragraph"&gt;This pattern keeps reporting predictable and decoupled from operational retention and &lt;CODE class="p8i6j0f"&gt;VACUUM&lt;/CODE&gt; settings, while you still keep ~30 days of native time travel for debugging and recovery.&lt;/P&gt;</description>
      <pubDate>Mon, 20 Apr 2026 20:47:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/designing-reliable-data-versioning-strategies-in-databricks/m-p/154993#M54168</guid>
      <dc:creator>DivyaandData</dc:creator>
      <dc:date>2026-04-20T20:47:23Z</dc:date>
    </item>
    <item>
      <title>Re: Designing Reliable Data Versioning Strategies in Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/designing-reliable-data-versioning-strategies-in-databricks/m-p/155017#M54175</link>
      <description>&lt;P&gt;Thanks&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/24053"&gt;@lingareddy_Alva&lt;/a&gt;&amp;nbsp; for your suggestion.&lt;/P&gt;</description>
      <pubDate>Tue, 21 Apr 2026 08:24:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/designing-reliable-data-versioning-strategies-in-databricks/m-p/155017#M54175</guid>
      <dc:creator>Raj_DB</dc:creator>
      <dc:date>2026-04-21T08:24:02Z</dc:date>
    </item>
    <item>
      <title>Re: Designing Reliable Data Versioning Strategies in Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/designing-reliable-data-versioning-strategies-in-databricks/m-p/155080#M54183</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/202813"&gt;@DivyaandData&lt;/a&gt;&amp;nbsp;, Thank you so much for the explanation. I have one question: why is time travel not a good choice, and does it cost more than using a simple append-only history table?&lt;/P&gt;</description>
      <pubDate>Tue, 21 Apr 2026 12:20:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/designing-reliable-data-versioning-strategies-in-databricks/m-p/155080#M54183</guid>
      <dc:creator>Raj_DB</dc:creator>
      <dc:date>2026-04-21T12:20:46Z</dc:date>
    </item>
    <item>
      <title>Re: Designing Reliable Data Versioning Strategies in Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/designing-reliable-data-versioning-strategies-in-databricks/m-p/155104#M54191</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/177179"&gt;@Raj_DB&lt;/a&gt;&amp;nbsp;, The TLDR is&amp;nbsp;&amp;nbsp;time travel is great for short-term ops and debugging, but brittle as your primary reporting history, and its cost profile is harder to control and reason about than a purpose-built history table.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;SPAN&gt;Docs &lt;A href="https://docs.databricks.com/aws/en/delta/history" target="_self"&gt;1&lt;/A&gt;,&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/delta/history" target="_self"&gt;2&lt;/A&gt; explicitly say Delta table history/time travel is for auditing, rollback, and point-in-time queries, and is &lt;/SPAN&gt;not recommended as a long-term backup/archival solution&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;&lt;SPAN&gt;In new runtimes, time travel is blocked once you go beyond &lt;CODE class="p8i6j0f"&gt;delta.deletedFileRetentionDuration&lt;/CODE&gt; (default 7 days), regardless of VACUUM arguments. &lt;A href="https://docs.databricks.com/aws/en/delta/history" target="_self"&gt;Source&lt;/A&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="tk0j8o1 _1ibi0s31a _1ibi0s3do"&gt;Time travel depends on both log and data retention; VACUUM and log cleanup permanently remove older versions. Docs - &lt;A href="https://docs.databricks.com/aws/en/security/privacy/gdpr-delta" target="_self"&gt;1&lt;/A&gt;,&lt;A href="https://docs.databricks.com/aws/en/delta/history" target="_self"&gt;2&lt;/A&gt;&lt;/DIV&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;DIV class="tk0j8o1 _1ibi0s31a _1ibi0s3do"&gt;In terms of cost -&lt;/DIV&gt;
&lt;DIV class="tk0j8o1 _1ibi0s31a _1ibi0s3do"&gt;
&lt;UL&gt;
&lt;LI class="tk0j8o1 _1ibi0s31a _1ibi0s3do"&gt;There’s no separate “time travel fee” – you pay for storage and compute in both patterns. The cost difference comes from how much data you retain and how it’s organized.&lt;/LI&gt;
&lt;LI class="tk0j8o1 _1ibi0s31a _1ibi0s3do"&gt;To get a long time-travel window, you must keep all old data files and logs for the source table for that period (&lt;CODE class="p8i6j0f"&gt;deletedFileRetentionDuration&lt;/CODE&gt; and &lt;CODE class="p8i6j0f"&gt;logRetentionDuration&lt;/CODE&gt;), which increases storage and can slow queries because they scan more versions/files. &lt;A href="https://docs.databricks.com/aws/en/delta/history" target="_self"&gt;Source&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;With a dedicated history table, you can:&lt;/P&gt;
&lt;UL class="p8i6j08 p8i6j02"&gt;
&lt;LI class="p8i6j0a"&gt;Store only what reporting needs (e.g., daily snapshots or SCD facts instead of every tiny intermediate write).&lt;/LI&gt;
&lt;LI class="p8i6j0a"&gt;Partition and compact for reporting patterns.&lt;/LI&gt;
&lt;LI class="p8i6j0a"&gt;Apply your own retention (e.g., 13 months) without affecting operational tables or being surprised by VACUUM behavior.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/DIV&gt;</description>
      <pubDate>Tue, 21 Apr 2026 15:18:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/designing-reliable-data-versioning-strategies-in-databricks/m-p/155104#M54191</guid>
      <dc:creator>DivyaandData</dc:creator>
      <dc:date>2026-04-21T15:18:37Z</dc:date>
    </item>
  </channel>
</rss>

