cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

how does databricks time travel work

srDataEngineer
New Contributor II

Hi,

Since it is not very well explained, I want to know if the table history is a snapshot of the whole table at that point of time containing all the data or it tracks only some metadata of the table changes.

To be more precise : if I have a table into which I append data daily, does this mean that I have files containing duplicated data for each the data contained in that table at that history moment ? thank you

1 ACCEPTED SOLUTION

Accepted Solutions

Lakshay
Esteemed Contributor
Esteemed Contributor

Hi @data engineer​ , If you are doing an insert, only the new data is written physically. And the metadata of this operation is captured in the version file where one of the information is that which files were added/removed as part of this operation. And when reading from the delta table, it uses this information from the version file to perform time travel.

View solution in original post

4 REPLIES 4

Rishabh264
Honored Contributor II

hey @data engineer​ actually the concept behind the data bricks delta time travel is to get rolled back the changes you have made mistakes or you want to check the date of the version as well. when you append the new records daily, it got appended daily and will create a version of the table at a particular point of time interval, so if you want to check what data you have loaded the previous day so you can check that with the use of time travel.

Yes that's the well known aim of time travel. What I want to understand is for now if I have a delta table and append some tows to it, will only the appended rows written physically or the whole old data is duplicated physically into two versions the first in a folder before appending and the Other one in another folder with appeneee rows

Lakshay
Esteemed Contributor
Esteemed Contributor

Hi @data engineer​ , If you are doing an insert, only the new data is written physically. And the metadata of this operation is captured in the version file where one of the information is that which files were added/removed as part of this operation. And when reading from the delta table, it uses this information from the version file to perform time travel.

Anonymous
Not applicable

Hi @data engineer​ 

Hope everything is going great.

Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can help you. 

Cheers!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.