03-13-2022 09:30 PM
Hi Experts ,
No doubt data bricks supports ACID properties. What when it comes to versioning how much such versions will data bricks captures ?
For Example : If i do any DML operations on top of Delta tables every time when i do it captures the transaction log so how many such versions will be produced ? can we get the versions back from 6 last months ?
Thanks
03-14-2022 06:54 PM
To time travel to a previous version, you need data and metadata. Metadata(json files in the delta log directory) by default comes with a retention of 30 days. You will need to increase the retention to be able to time travel to older versions. (delta.logRetentionDuration)
Similarly, you need to increase the time your old data files should be stored(delta.deletedFileRetentionDuration)
Configuration details : https://docs.databricks.com/delta/delta-batch.html#data-retention
Read more about Delta log structure here - https://github.com/delta-io/delta/blob/master/PROTOCOL.md
03-14-2022 03:15 AM
Yes you can get indefinite versioning, only one command which delete delta versions is VACUUM so just never run it.
03-14-2022 06:54 PM
To time travel to a previous version, you need data and metadata. Metadata(json files in the delta log directory) by default comes with a retention of 30 days. You will need to increase the retention to be able to time travel to older versions. (delta.logRetentionDuration)
Similarly, you need to increase the time your old data files should be stored(delta.deletedFileRetentionDuration)
Configuration details : https://docs.databricks.com/delta/delta-batch.html#data-retention
Read more about Delta log structure here - https://github.com/delta-io/delta/blob/master/PROTOCOL.md
03-15-2022 12:44 AM
Also keep in mind that the log does not work on row level but file level. So the overhead of keeping history is larger compared to a classic rdbms.
07-20-2023 04:15 AM
Hey,
As a data enthusiast myself, I find this topic quite intriguing. Data Bricks indeed does a fantastic job in supporting ACID properties, ensuring data integrity, and allowing for versioning.
To address BasavarajAngadi's question, Data Bricks efficiently captures versions through transaction logs for each DML operation on Delta tables. The number of versions created will depend on the frequency of changes made to the data. It means that every time you perform a DML operation, a new version is recorded in the transaction log.
As for accessing versions from the past, Data Bricks offers a great advantage. You can retrieve versions dating back to at least six months, allowing for comprehensive historical analysis and rollback possibilities.
My advice to BasavarajAngadi (author) would be to explore the versioning and time travel functionalities within Data Bricks thoroughly. I would also recommend reading this article - DeFi dApps in Bitcoin Era: 4 Opportunities to Drive Unprecedented Growth. It did not leave me indifferent and I think it will be useful for you as well.Utilize them wisely to maintain data consistency, traceability, and enable easy rollback when necessary.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group