cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Question on Transaction logs and versioning in data bricks ?

BasavarajAngadi
Contributor

Hi Experts ,

No doubt data bricks supports ACID properties. What when it comes to versioning how much such versions will data bricks captures ?

For Example : If i do any DML operations on top of Delta tables every time when i do it captures the transaction log so how many such versions will be produced ? can we get the versions back from 6 last months ?

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

brickster_2018
Esteemed Contributor
Esteemed Contributor

To time travel to a previous version, you need data and metadata. Metadata(json files in the delta log directory) by default comes with a retention of 30 days. You will need to increase the retention to be able to time travel to older versions. (delta.logRetentionDuration)

Similarly, you need to increase the time your old data files should be stored(delta.deletedFileRetentionDuration)

Configuration details : https://docs.databricks.com/delta/delta-batch.html#data-retention

Read more about Delta log structure here - https://github.com/delta-io/delta/blob/master/PROTOCOL.md

View solution in original post

4 REPLIES 4

Hubert-Dudek
Esteemed Contributor III

Yes you can get indefinite versioning, only one command which delete delta versions is VACUUM so just never run it.

brickster_2018
Esteemed Contributor
Esteemed Contributor

To time travel to a previous version, you need data and metadata. Metadata(json files in the delta log directory) by default comes with a retention of 30 days. You will need to increase the retention to be able to time travel to older versions. (delta.logRetentionDuration)

Similarly, you need to increase the time your old data files should be stored(delta.deletedFileRetentionDuration)

Configuration details : https://docs.databricks.com/delta/delta-batch.html#data-retention

Read more about Delta log structure here - https://github.com/delta-io/delta/blob/master/PROTOCOL.md

-werners-
Esteemed Contributor III

Also keep in mind that the log does not work on row level but file level. So the overhead of keeping history is larger compared to a classic rdbms.

stefnhuy
New Contributor III

Hey,

As a data enthusiast myself, I find this topic quite intriguing. Data Bricks indeed does a fantastic job in supporting ACID properties, ensuring data integrity, and allowing for versioning.

To address BasavarajAngadi's question, Data Bricks efficiently captures versions through transaction logs for each DML operation on Delta tables. The number of versions created will depend on the frequency of changes made to the data. It means that every time you perform a DML operation, a new version is recorded in the transaction log.

As for accessing versions from the past, Data Bricks offers a great advantage. You can retrieve versions dating back to at least six months, allowing for comprehensive historical analysis and rollback possibilities.

My advice to BasavarajAngadi (author) would be to explore the versioning and time travel functionalities within Data Bricks thoroughly. I would also recommend reading this article - DeFi dApps in Bitcoin Era: 4 Opportunities to Drive Unprecedented Growth. It did not leave me indifferent and I think it will be useful for you as well.Utilize them wisely to maintain data consistency, traceability, and enable easy rollback when necessary.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!