cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

how does databricks time travel work

srDataEngineer
New Contributor II

Hi,

Since it is not very well explained, I want to know if the table history is a snapshot of the whole table at that point of time containing all the data or it tracks only some metadata of the table changes.

To be more precise : if I have a table into which I append data daily, does this mean that I have files containing duplicated data for each the data contained in that table at that history moment ? thank you

1 ACCEPTED SOLUTION

Accepted Solutions

Hi @data engineerโ€‹ , If you are doing an insert, only the new data is written physically. And the metadata of this operation is captured in the version file where one of the information is that which files were added/removed as part of this operation. And when reading from the delta table, it uses this information from the version file to perform time travel.

View solution in original post

4 REPLIES 4

Rishabh-Pandey
Esteemed Contributor

hey @data engineerโ€‹ actually the concept behind the data bricks delta time travel is to get rolled back the changes you have made mistakes or you want to check the date of the version as well. when you append the new records daily, it got appended daily and will create a version of the table at a particular point of time interval, so if you want to check what data you have loaded the previous day so you can check that with the use of time travel.

Rishabh Pandey

Yes that's the well known aim of time travel. What I want to understand is for now if I have a delta table and append some tows to it, will only the appended rows written physically or the whole old data is duplicated physically into two versions the first in a folder before appending and the Other one in another folder with appeneee rows

Hi @data engineerโ€‹ , If you are doing an insert, only the new data is written physically. And the metadata of this operation is captured in the version file where one of the information is that which files were added/removed as part of this operation. And when reading from the delta table, it uses this information from the version file to perform time travel.

Anonymous
Not applicable

Hi @data engineerโ€‹ 

Hope everything is going great.

Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can help you. 

Cheers!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group