yesterday
Hi everyone,
Iโm working on a use case where I need to retain 30 days of historical data in a Delta table and use it to build trend reports.
Iโm looking for the best approach to reliably maintain this historical data while also making it suitable for reporting. I explored the Delta Lake time travel feature, but it currently only allows access to about 7 days of history, which doesnโt meet my requirement.
What would be the recommended way to handle this scenario?
Thank you!
yesterday - last edited yesterday
Hi @Raj_DB ,
For reliable 30-day trend reporting, don't rely on time travel โ use an explicit snapshot table instead.
The pattern:
1. Daily job appends a snapshot of your source table to a history table with a snapshot_date partition column
2. Purge rows older than 30 days with a scheduled DELETE WHERE snapshot_date < current_date() - INTERVAL 30 DAYS
3. Run VACUUM after to reclaim storage.
This gives you full control over retention and keeps trend queries fast via partition pruning.
8 hours ago
Thanks @lingareddy_Alva for your suggestion.
yesterday
Youโre hitting the new 7-day time-travel enforcement: in recent runtimes, time travel is blocked beyond delta.deletedFileRetentionDuration (default 7 days), regardless of VACUUM args or logRetentionDuration.
To support 30 days of historical data for reporting, you have two options:
If 30 days is enough history and youโre okay paying for the extra storage, just configure the tableโs Delta properties:
ALTER TABLE catalog.schema.table_name
SET TBLPROPERTIES (
'delta.deletedFileRetentionDuration' = 'interval 30 days',
'delta.logRetentionDuration' = 'interval 30 days'
);
delta.deletedFileRetentionDuration controls how long old data files are kept and therefore the maximum time-travel window (default 7 days).
delta.logRetentionDuration controls how long the transaction log/history is kept (default 30 days) and must be โฅ deletedFileRetentionDuration in new runtimes.
delta.deletedFileRetentionDuration = "interval 30 days" to match the default logRetentionDuration.This is the simplest way to have a 30-day rolling history directly available via time travel for your trend reports.
Delta docs and guidance caution against relying on table history as a long-term backup or analytics store and recommend using history/time travel mainly for short-term ops and recovery.
For reporting use cases (especially if you may want >30 days later), a common pattern is:
snapshot_date or business date).snapshot_date, orThis pattern keeps reporting predictable and decoupled from operational retention and VACUUM settings, while you still keep ~30 days of native time travel for debugging and recovery.
4 hours ago
Hi @DivyaandData , Thank you so much for the explanation. I have one question: why is time travel not a good choice, and does it cost more than using a simple append-only history table?
an hour ago
Hey @Raj_DB , The TLDR is time travel is great for short-term ops and debugging, but brittle as your primary reporting history, and its cost profile is harder to control and reason about than a purpose-built history table.
delta.deletedFileRetentionDuration (default 7 days), regardless of VACUUM arguments. SourcedeletedFileRetentionDuration and logRetentionDuration), which increases storage and can slow queries because they scan more versions/files. SourceWith a dedicated history table, you can:
