cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Time Travel vs Bronze historical archive

Locomo_Dncr
New Contributor

Hello

I am working on building a pipeline using Medallion architecture. The source tables in this pipeline are overwritten each time the table is updated. In the bronze ingestion layer, I plan to append this new table to the current bronze table, adding an ingestion column to the source data to document time of ingestion. I also plan to remove any data older than the retention period dictated by my team.

I was wondering what is the difference between using Delta Lake Time Travel vs maintaining a bronze layer that contains historical data. Is there a major cost difference over choosing to use Time Travel over historical copies in bronze if the table is very big? Is there a processing time benefit over have a bronze layer that contains historical data over using time travel?

4 REPLIES 4

KaranamS
Contributor III

Hi @Locomo_Dncr , 

The default retention time for Time travel files is 7 days. If your retention period is more than that, maintaining historical data in bronze layer is better than relying on Time travel. Time travel is best suited for short term access and requires log retention management where as historical bronze table offers better control and query performance especially for larger datasets.

Hope this helps!

lingareddy_Alva
Honored Contributor III

Hi @Locomo_Dncr 

Great question about Delta Lake Time Travel vs. maintaining historical data in your bronze layer.

Delta Time Travel:

Cost: Lower for tables with few changes, higher for frequently updated large tables
Performance: Instant point-in-time access, optimized metadata queries
Best for: Smaller tables, short retention periods, simple recovery needs

Bronze Historical Layer:

Cost: Higher storage but predictable, grows linearly with ingestions
Performance: Better for time-range analytics, custom optimizations
Best for: Very large tables, long retention, complex time-based queries

For very big tables: Bronze historical layer is usually more cost-effective due to Delta's copy-on-write overhead.
Recommended: Hybrid approach - Time Travel for recent data (7-30 days) + Bronze historical for long-term retention.

 

LR

szymon_dybczak
Esteemed Contributor III

Hi @Locomo_Dncr ,

I wouldn't recommend using time travel for maintaining historical version. Time travel feature should be used to audit operations, rollback a table or query a table at a specific point in time.

Keep in mind, to query a previous table version, you must retain both the log and the data files for that. So, if  someone runs VACCUM, your history could vanish.

szymon_dybczak_0-1753163653447.png

 

MariuszK
Valued Contributor III

Hi @Locomo_Dncr 

Time travel isn't recomended to store historical data. It's for backup, audit purpose. You can store snapshot data or use SCD2 to keep history.

"Databricks does not recommend using Delta Lake table history as a long-term backup solution for data archival. Databricks recommends using only the past 7 days for time travel operations unless you have set both data and log retention configurations to a larger value."

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now