cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Dela Table and history

patilsuhasv
New Contributor

Hi All,

How can I maintain 7 years of transactional data in delta table? Can I have log retention of 7 days, but data retention of 7 years?

Appreciate your response.

Thanks and regards 

Suhas

2 REPLIES 2

Louis_Frolio
Databricks Employee
Databricks Employee

Hey @patilsuhasv , 

Yes, you absolutely can maintain 7 years of transactional data in a Delta table while having only 7 days of log retention. These are two separate concepts that work independently.

Understanding the Difference

Log Retention controls how long Delta Lake keeps the transaction log history (used for time travel and versioning capabilities). This doesn't delete your actual data.

Data Retention refers to how long you keep the actual data records in your table. Your current data remains in the table regardless of log retention settings.

Key Configuration

There are two main properties to understand:

delta.logRetentionDuration: Controls transaction log history (default 30 days)
- This affects your ability to time travel to previous versions
- Setting this to 7 days means you can only query versions from the last 7 days

delta.deletedFileRetentionDuration: Controls when VACUUM removes old data files (default 7 days)
- This only affects files that are no longer part of the current table (due to updates, deletes, or merges)
- Does NOT delete your current active data

Solution for Your Use Case

To keep 7 years of data with 7 days of log retention:

```sql
-- Set log retention to 7 days
ALTER TABLE your_table SET TBLPROPERTIES (
'delta.logRetentionDuration' = 'interval 7 days'
);
```

Your current transactional data will remain in the table indefinitely. The 7-day log retention simply means:
- You can only time travel back 7 days
- Transaction logs older than 7 days will be cleaned up automatically

Important Considerations

What Gets Kept:
- All current data in your table (regardless of age) stays
- Only historical versions beyond 7 days become inaccessible for time travel
- Active records from 7 years ago remain queryable as part of the current table state

What Gets Cleaned:
- Transaction logs older than 7 days
- Old data file versions (if you run VACUUM) that are no longer part of the current table

Best Practices:
- Don't run VACUUM frequently if you need longer time travel capabilities
- If you never update/delete records, your data files stay intact even with short log retention
- Consider your compliance and audit requirements before reducing log retention
- Storage costs are primarily driven by data volume, not log retention

Example Scenario

If you have a transaction table with records from 2018-2025:
- With 7 days log retention: All transactions from 2018-2025 remain queryable in the current table
- You just can't time travel to see what the table looked like 30 days ago
- The actual transaction records themselves are NOT deleted

The log retention setting is about versioning history, not data lifecycle management. Your 7-year-old transactions will remain accessible in your table's current state.

Hope this helps, Louis.



Isi
Honored Contributor III

Hi @patilsuhasv,

Maintaining 7 years of transactional data in a Delta Lake table is technically possible, but it requires careful configuration โ€” and itโ€™s important to understand the recent changes Databricks announced (effective December 2025) regarding time travel and VACUUM.

Databricks is aligning time travel retention with table-level retention settings to ensure consistent and deterministic behavior.

From that date onward:

  • Time travel queries (SELECT โ€ฆ AS OF, RESTORE, CLONE, CDC) will only be allowed within the period defined by delta.deletedFileRetentionDuration.

  • The retention argument in VACUUM will be ignored (except when set to 0 hours, to fully purge history).

  • delta.logRetentionDuration must be greater than or equal to delta.deletedFileRetentionDuration.

So that configuration will not be allowed after the upcoming Databricks update.

delta.logRetentionDuration must always be equal to or greater than delta.deletedFileRetentionDuration.

As per the official Databricks documentation:

Delta table history โ€” Databricks Docs

"Databricks does not recommend using Delta Lake table history as a long-term backup solution for data archival"
"Databricks recommends using only the past 7 days for time travel operations, unless you have set both data and log retention configurations to a larger value.โ€

If your goal is regulatory data retention (e.g., 7 years), you should:

  • Keep your operational Delta table with a short retention (e.g., 7โ€“30 days).

  • Periodically snapshot or archive the table (e.g., by writing it to versioned Parquet files, or exporting to object storage such as S3 Glacier).

  • Use those archives for compliance or recovery purposes โ€” not as active Delta history.

 

Hope this helps :),

Isi