<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Designing Reliable Data Versioning Strategies in Databricks in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/designing-reliable-data-versioning-strategies-in-databricks/m-p/155121#M54194</link>
    <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;TT s useful for operational recovery, debugging and short term rollback, but it’s usually not the best for reporting.Why ? because simply reporting needs a stable and historical model. TT gives you old table versions but it does not give you a business friendly history structure such as snapshot dates, effective dates, change types or even premodeled trends. It is also coupled to retention settings and file lifecycle behavior, including VACUUM and for UC MT and PO&amp;nbsp; which runs VACUUM automatically.&lt;/P&gt;</description>
    <pubDate>Tue, 21 Apr 2026 17:41:17 GMT</pubDate>
    <dc:creator>amirabedhiafi</dc:creator>
    <dc:date>2026-04-21T17:41:17Z</dc:date>
    <item>
      <title>Designing Reliable Data Versioning Strategies in Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/designing-reliable-data-versioning-strategies-in-databricks/m-p/154987#M54166</link>
      <description>&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;I’m working on a use case where I need to retain &lt;STRONG&gt;30 days of historical data&lt;/STRONG&gt; in a Delta table and use it to build trend reports.&lt;/P&gt;&lt;P&gt;I’m looking for the best approach to reliably maintain this historical data while also making it suitable for reporting. I explored the Delta Lake time travel feature, but it currently only allows access to about &lt;STRONG&gt;7 days of history&lt;/STRONG&gt;, which doesn’t meet my requirement.&lt;/P&gt;&lt;P&gt;What would be the recommended way to handle this scenario?&lt;/P&gt;&lt;P&gt;Thank you!&lt;/P&gt;</description>
      <pubDate>Mon, 20 Apr 2026 18:54:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/designing-reliable-data-versioning-strategies-in-databricks/m-p/154987#M54166</guid>
      <dc:creator>Raj_DB</dc:creator>
      <dc:date>2026-04-20T18:54:50Z</dc:date>
    </item>
    <item>
      <title>Re: Designing Reliable Data Versioning Strategies in Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/designing-reliable-data-versioning-strategies-in-databricks/m-p/154992#M54167</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/177179"&gt;@Raj_DB&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;For reliable 30-day trend reporting, don't rely on time travel — use an explicit snapshot table instead.&lt;BR /&gt;The pattern:&lt;BR /&gt;1. Daily job appends a snapshot of your source table to a history table with a snapshot_date partition column&lt;BR /&gt;2. Purge rows older than 30 days with a scheduled DELETE WHERE snapshot_date &amp;lt; current_date() - INTERVAL 30 DAYS&lt;BR /&gt;3. Run VACUUM after to reclaim storage.&lt;BR /&gt;This gives you full control over retention and keeps trend queries fast via partition pruning.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 20 Apr 2026 20:23:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/designing-reliable-data-versioning-strategies-in-databricks/m-p/154992#M54167</guid>
      <dc:creator>lingareddy_Alva</dc:creator>
      <dc:date>2026-04-20T20:23:06Z</dc:date>
    </item>
    <item>
      <title>Re: Designing Reliable Data Versioning Strategies in Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/designing-reliable-data-versioning-strategies-in-databricks/m-p/154993#M54168</link>
      <description>&lt;P class="p8i6j01 paragraph"&gt;You’re hitting the new 7-day time-travel enforcement: in recent runtimes, time travel is blocked beyond &lt;CODE class="p8i6j0f"&gt;delta.deletedFileRetentionDuration&lt;/CODE&gt; (default 7 days), regardless of &lt;CODE class="p8i6j0f"&gt;VACUUM&lt;/CODE&gt; args or &lt;CODE class="p8i6j0f"&gt;logRetentionDuration&lt;/CODE&gt;.&lt;/P&gt;
&lt;P class="p8i6j01 paragraph"&gt;To support 30 days of historical data for reporting, you have two options:&lt;/P&gt;
&lt;H3 class="_9k2iva0 p8i6j0c _1ibi0s314 heading3 _9k2iva1"&gt;1. Use Delta retention settings for 30-day time travel&lt;/H3&gt;
&lt;P class="p8i6j01 paragraph"&gt;If 30 days is enough history and you’re okay paying for the extra storage, just configure the table’s Delta properties:&lt;/P&gt;
&lt;DIV class="l8rrz21 _1ibi0s3do" data-ui-element="code-block-container"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-sql p8i6j0e hljs language-sql _12n1b832"&gt;&lt;SPAN class="hljs-keyword"&gt;ALTER TABLE&lt;/SPAN&gt; catalog.schema.table_name
&lt;SPAN class="hljs-keyword"&gt;SET&lt;/SPAN&gt; TBLPROPERTIES (
  &lt;SPAN class="hljs-string"&gt;'delta.deletedFileRetentionDuration'&lt;/SPAN&gt; &lt;SPAN class="hljs-operator"&gt;=&lt;/SPAN&gt; &lt;SPAN class="hljs-string"&gt;'interval 30 days'&lt;/SPAN&gt;,
  &lt;SPAN class="hljs-string"&gt;'delta.logRetentionDuration'&lt;/SPAN&gt;        &lt;SPAN class="hljs-operator"&gt;=&lt;/SPAN&gt; &lt;SPAN class="hljs-string"&gt;'interval 30 days'&lt;/SPAN&gt;
);&lt;/CODE&gt;&lt;/PRE&gt;
&lt;DIV class="l8rrz23 _1ibi0s3d7 _1ibi0s332 _1ibi0s3dp _1ibi0s3bm _1ibi0s3ce"&gt;
&lt;DIV class="l8rrz25 _1ibi0s3dc"&gt;sql&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;UL class="p8i6j07 p8i6j02"&gt;
&lt;LI class="p8i6j0a"&gt;&lt;CODE class="p8i6j0f"&gt;delta.deletedFileRetentionDuration&lt;/CODE&gt; controls how long old data files are kept and therefore the maximum time-travel window (default 7 days).
&lt;DIV class="tk0j8o1 _1ibi0s31a _1ibi0s3do"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI class="p8i6j0a"&gt;&lt;CODE class="p8i6j0f"&gt;delta.logRetentionDuration&lt;/CODE&gt; controls how long the transaction log/history is kept (default 30 days) and must be &lt;STRONG&gt;≥ &lt;CODE class="p8i6j0f"&gt;deletedFileRetentionDuration&lt;/CODE&gt;&lt;/STRONG&gt; in new runtimes.
&lt;DIV class="tk0j8o1 _1ibi0s31a _1ibi0s3do"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI class="p8i6j0a"&gt;Docs explicitly note: to access 30 days of history, set &lt;CODE class="p8i6j0f"&gt;delta.deletedFileRetentionDuration = "interval 30 days"&lt;/CODE&gt; to match the default &lt;CODE class="p8i6j0f"&gt;logRetentionDuration&lt;/CODE&gt;.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="p8i6j01 paragraph"&gt;This is the simplest way to have a 30-day rolling history directly available via time travel for your trend reports.&lt;/P&gt;
&lt;H3 class="_9k2iva0 p8i6j0c _1ibi0s314 heading3 _9k2iva1"&gt;2. (Recommended for robust reporting) Build a dedicated history / trend table&lt;/H3&gt;
&lt;P class="p8i6j01 paragraph"&gt;Delta docs and guidance caution against relying on table history as a long-term backup or analytics store and recommend using history/time travel mainly for short-term ops and recovery.&lt;/P&gt;
&lt;P class="p8i6j01 paragraph"&gt;For reporting use cases (especially if you may want &amp;gt;30 days later), a common pattern is:&lt;/P&gt;
&lt;UL class="p8i6j07 p8i6j02"&gt;
&lt;LI class="p8i6j0a"&gt;Create a separate “history” or “snapshot” Delta table, partitioned by a logical date (e.g., &lt;CODE class="p8i6j0f"&gt;snapshot_date&lt;/CODE&gt; or business date).&lt;/LI&gt;
&lt;LI class="p8i6j0a"&gt;On a daily job (or whatever cadence you need):
&lt;UL class="p8i6j08 p8i6j02"&gt;
&lt;LI class="p8i6j0a"&gt;Either:
&lt;UL class="p8i6j09 p8i6j02"&gt;
&lt;LI class="p8i6j0a"&gt;Take a point-in-time snapshot of the source table and append it with that day’s &lt;CODE class="p8i6j0f"&gt;snapshot_date&lt;/CODE&gt;, or&lt;/LI&gt;
&lt;LI class="p8i6j0a"&gt;Use Change Data Feed (CDF) to upsert incremental changes into a slowly-changing dimension / fact table with effective dates.
&lt;DIV class="tk0j8o1 _1ibi0s31a _1ibi0s3do"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI class="p8i6j0a"&gt;Point your BI / trend reports at this &lt;STRONG&gt;reporting table&lt;/STRONG&gt;, not at time-travel queries.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="p8i6j01 paragraph"&gt;This pattern keeps reporting predictable and decoupled from operational retention and &lt;CODE class="p8i6j0f"&gt;VACUUM&lt;/CODE&gt; settings, while you still keep ~30 days of native time travel for debugging and recovery.&lt;/P&gt;</description>
      <pubDate>Mon, 20 Apr 2026 20:47:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/designing-reliable-data-versioning-strategies-in-databricks/m-p/154993#M54168</guid>
      <dc:creator>DivyaandData</dc:creator>
      <dc:date>2026-04-20T20:47:23Z</dc:date>
    </item>
    <item>
      <title>Re: Designing Reliable Data Versioning Strategies in Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/designing-reliable-data-versioning-strategies-in-databricks/m-p/155017#M54175</link>
      <description>&lt;P&gt;Thanks&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/24053"&gt;@lingareddy_Alva&lt;/a&gt;&amp;nbsp; for your suggestion.&lt;/P&gt;</description>
      <pubDate>Tue, 21 Apr 2026 08:24:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/designing-reliable-data-versioning-strategies-in-databricks/m-p/155017#M54175</guid>
      <dc:creator>Raj_DB</dc:creator>
      <dc:date>2026-04-21T08:24:02Z</dc:date>
    </item>
    <item>
      <title>Re: Designing Reliable Data Versioning Strategies in Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/designing-reliable-data-versioning-strategies-in-databricks/m-p/155080#M54183</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/202813"&gt;@DivyaandData&lt;/a&gt;&amp;nbsp;, Thank you so much for the explanation. I have one question: why is time travel not a good choice, and does it cost more than using a simple append-only history table?&lt;/P&gt;</description>
      <pubDate>Tue, 21 Apr 2026 12:20:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/designing-reliable-data-versioning-strategies-in-databricks/m-p/155080#M54183</guid>
      <dc:creator>Raj_DB</dc:creator>
      <dc:date>2026-04-21T12:20:46Z</dc:date>
    </item>
    <item>
      <title>Re: Designing Reliable Data Versioning Strategies in Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/designing-reliable-data-versioning-strategies-in-databricks/m-p/155104#M54191</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/177179"&gt;@Raj_DB&lt;/a&gt;&amp;nbsp;, The TLDR is&amp;nbsp;&amp;nbsp;time travel is great for short-term ops and debugging, but brittle as your primary reporting history, and its cost profile is harder to control and reason about than a purpose-built history table.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;SPAN&gt;Docs &lt;A href="https://docs.databricks.com/aws/en/delta/history" target="_self"&gt;1&lt;/A&gt;,&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/delta/history" target="_self"&gt;2&lt;/A&gt; explicitly say Delta table history/time travel is for auditing, rollback, and point-in-time queries, and is &lt;/SPAN&gt;not recommended as a long-term backup/archival solution&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;&lt;SPAN&gt;In new runtimes, time travel is blocked once you go beyond &lt;CODE class="p8i6j0f"&gt;delta.deletedFileRetentionDuration&lt;/CODE&gt; (default 7 days), regardless of VACUUM arguments. &lt;A href="https://docs.databricks.com/aws/en/delta/history" target="_self"&gt;Source&lt;/A&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="tk0j8o1 _1ibi0s31a _1ibi0s3do"&gt;Time travel depends on both log and data retention; VACUUM and log cleanup permanently remove older versions. Docs - &lt;A href="https://docs.databricks.com/aws/en/security/privacy/gdpr-delta" target="_self"&gt;1&lt;/A&gt;,&lt;A href="https://docs.databricks.com/aws/en/delta/history" target="_self"&gt;2&lt;/A&gt;&lt;/DIV&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;DIV class="tk0j8o1 _1ibi0s31a _1ibi0s3do"&gt;In terms of cost -&lt;/DIV&gt;
&lt;DIV class="tk0j8o1 _1ibi0s31a _1ibi0s3do"&gt;
&lt;UL&gt;
&lt;LI class="tk0j8o1 _1ibi0s31a _1ibi0s3do"&gt;There’s no separate “time travel fee” – you pay for storage and compute in both patterns. The cost difference comes from how much data you retain and how it’s organized.&lt;/LI&gt;
&lt;LI class="tk0j8o1 _1ibi0s31a _1ibi0s3do"&gt;To get a long time-travel window, you must keep all old data files and logs for the source table for that period (&lt;CODE class="p8i6j0f"&gt;deletedFileRetentionDuration&lt;/CODE&gt; and &lt;CODE class="p8i6j0f"&gt;logRetentionDuration&lt;/CODE&gt;), which increases storage and can slow queries because they scan more versions/files. &lt;A href="https://docs.databricks.com/aws/en/delta/history" target="_self"&gt;Source&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;With a dedicated history table, you can:&lt;/P&gt;
&lt;UL class="p8i6j08 p8i6j02"&gt;
&lt;LI class="p8i6j0a"&gt;Store only what reporting needs (e.g., daily snapshots or SCD facts instead of every tiny intermediate write).&lt;/LI&gt;
&lt;LI class="p8i6j0a"&gt;Partition and compact for reporting patterns.&lt;/LI&gt;
&lt;LI class="p8i6j0a"&gt;Apply your own retention (e.g., 13 months) without affecting operational tables or being surprised by VACUUM behavior.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/DIV&gt;</description>
      <pubDate>Tue, 21 Apr 2026 15:18:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/designing-reliable-data-versioning-strategies-in-databricks/m-p/155104#M54191</guid>
      <dc:creator>DivyaandData</dc:creator>
      <dc:date>2026-04-21T15:18:37Z</dc:date>
    </item>
    <item>
      <title>Re: Designing Reliable Data Versioning Strategies in Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/designing-reliable-data-versioning-strategies-in-databricks/m-p/155121#M54194</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;TT s useful for operational recovery, debugging and short term rollback, but it’s usually not the best for reporting.Why ? because simply reporting needs a stable and historical model. TT gives you old table versions but it does not give you a business friendly history structure such as snapshot dates, effective dates, change types or even premodeled trends. It is also coupled to retention settings and file lifecycle behavior, including VACUUM and for UC MT and PO&amp;nbsp; which runs VACUUM automatically.&lt;/P&gt;</description>
      <pubDate>Tue, 21 Apr 2026 17:41:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/designing-reliable-data-versioning-strategies-in-databricks/m-p/155121#M54194</guid>
      <dc:creator>amirabedhiafi</dc:creator>
      <dc:date>2026-04-21T17:41:17Z</dc:date>
    </item>
    <item>
      <title>Re: Designing Reliable Data Versioning Strategies in Databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/designing-reliable-data-versioning-strategies-in-databricks/m-p/155171#M54201</link>
      <description>&lt;P&gt;Thank you, &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/202813"&gt;@DivyaandData&lt;/a&gt;&amp;nbsp;, for the clear and detailed explanation - I really appreciate it.&lt;/P&gt;</description>
      <pubDate>Wed, 22 Apr 2026 08:26:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/designing-reliable-data-versioning-strategies-in-databricks/m-p/155171#M54201</guid>
      <dc:creator>Raj_DB</dc:creator>
      <dc:date>2026-04-22T08:26:37Z</dc:date>
    </item>
  </channel>
</rss>

