cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Question on Transaction logs and versioning in data bricks ?

BasavarajAngadi
Contributor

Hi Experts ,

No doubt data bricks supports ACID properties. What when it comes to versioning how much such versions will data bricks captures ?

For Example : If i do any DML operations on top of Delta tables every time when i do it captures the transaction log so how many such versions will be produced ? can we get the versions back from 6 last months ?

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

User16869510359
Esteemed Contributor

To time travel to a previous version, you need data and metadata. Metadata(json files in the delta log directory) by default comes with a retention of 30 days. You will need to increase the retention to be able to time travel to older versions. (delta.logRetentionDuration)

Similarly, you need to increase the time your old data files should be stored(delta.deletedFileRetentionDuration)

Configuration details : https://docs.databricks.com/delta/delta-batch.html#data-retention

Read more about Delta log structure here - https://github.com/delta-io/delta/blob/master/PROTOCOL.md

View solution in original post

4 REPLIES 4

Hubert-Dudek
Esteemed Contributor III

Yes you can get indefinite versioning, only one command which delete delta versions is VACUUM so just never run it.

User16869510359
Esteemed Contributor

To time travel to a previous version, you need data and metadata. Metadata(json files in the delta log directory) by default comes with a retention of 30 days. You will need to increase the retention to be able to time travel to older versions. (delta.logRetentionDuration)

Similarly, you need to increase the time your old data files should be stored(delta.deletedFileRetentionDuration)

Configuration details : https://docs.databricks.com/delta/delta-batch.html#data-retention

Read more about Delta log structure here - https://github.com/delta-io/delta/blob/master/PROTOCOL.md

-werners-
Esteemed Contributor III

Also keep in mind that the log does not work on row level but file level. So the overhead of keeping history is larger compared to a classic rdbms.

stefnhuy
New Contributor III

Hey,

As a data enthusiast myself, I find this topic quite intriguing. Data Bricks indeed does a fantastic job in supporting ACID properties, ensuring data integrity, and allowing for versioning.

To address BasavarajAngadi's question, Data Bricks efficiently captures versions through transaction logs for each DML operation on Delta tables. The number of versions created will depend on the frequency of changes made to the data. It means that every time you perform a DML operation, a new version is recorded in the transaction log.

As for accessing versions from the past, Data Bricks offers a great advantage. You can retrieve versions dating back to at least six months, allowing for comprehensive historical analysis and rollback possibilities.

My advice to BasavarajAngadi (author) would be to explore the versioning and time travel functionalities within Data Bricks thoroughly. I would also recommend reading this article - DeFi dApps in Bitcoin Era: 4 Opportunities to Drive Unprecedented Growth. It did not leave me indifferent and I think it will be useful for you as well.Utilize them wisely to maintain data consistency, traceability, and enable easy rollback when necessary.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.