yesterday
Hi everyone,
Iโm working on a use case where I need to retain 30 days of historical data in a Delta table and use it to build trend reports.
Iโm looking for the best approach to reliably maintain this historical data while also making it suitable for reporting. I explored the Delta Lake time travel feature, but it currently only allows access to about 7 days of history, which doesnโt meet my requirement.
What would be the recommended way to handle this scenario?
Thank you!
yesterday - last edited yesterday
Hi @Raj_DB ,
For reliable 30-day trend reporting, don't rely on time travel โ use an explicit snapshot table instead.
The pattern:
1. Daily job appends a snapshot of your source table to a history table with a snapshot_date partition column
2. Purge rows older than 30 days with a scheduled DELETE WHERE snapshot_date < current_date() - INTERVAL 30 DAYS
3. Run VACUUM after to reclaim storage.
This gives you full control over retention and keeps trend queries fast via partition pruning.
15 hours ago
Thanks @lingareddy_Alva for your suggestion.
yesterday
Youโre hitting the new 7-day time-travel enforcement: in recent runtimes, time travel is blocked beyond delta.deletedFileRetentionDuration (default 7 days), regardless of VACUUM args or logRetentionDuration.
To support 30 days of historical data for reporting, you have two options:
If 30 days is enough history and youโre okay paying for the extra storage, just configure the tableโs Delta properties:
ALTER TABLE catalog.schema.table_name
SET TBLPROPERTIES (
'delta.deletedFileRetentionDuration' = 'interval 30 days',
'delta.logRetentionDuration' = 'interval 30 days'
);
delta.deletedFileRetentionDuration controls how long old data files are kept and therefore the maximum time-travel window (default 7 days).
delta.logRetentionDuration controls how long the transaction log/history is kept (default 30 days) and must be โฅ deletedFileRetentionDuration in new runtimes.
delta.deletedFileRetentionDuration = "interval 30 days" to match the default logRetentionDuration.This is the simplest way to have a 30-day rolling history directly available via time travel for your trend reports.
Delta docs and guidance caution against relying on table history as a long-term backup or analytics store and recommend using history/time travel mainly for short-term ops and recovery.
For reporting use cases (especially if you may want >30 days later), a common pattern is:
snapshot_date or business date).snapshot_date, orThis pattern keeps reporting predictable and decoupled from operational retention and VACUUM settings, while you still keep ~30 days of native time travel for debugging and recovery.
11 hours ago
Hi @DivyaandData , Thank you so much for the explanation. I have one question: why is time travel not a good choice, and does it cost more than using a simple append-only history table?
5 hours ago
Hello,
TT s useful for operational recovery, debugging and short term rollback, but itโs usually not the best for reporting.Why ? because simply reporting needs a stable and historical model. TT gives you old table versions but it does not give you a business friendly history structure such as snapshot dates, effective dates, change types or even premodeled trends. It is also coupled to retention settings and file lifecycle behavior, including VACUUM and for UC MT and PO which runs VACUUM automatically.
8 hours ago
Hey @Raj_DB , The TLDR is time travel is great for short-term ops and debugging, but brittle as your primary reporting history, and its cost profile is harder to control and reason about than a purpose-built history table.
delta.deletedFileRetentionDuration (default 7 days), regardless of VACUUM arguments. SourcedeletedFileRetentionDuration and logRetentionDuration), which increases storage and can slow queries because they scan more versions/files. SourceWith a dedicated history table, you can:
