cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Delta Tables incremental backup method

SRS
New Contributor II

Hello,

Does anyone tried to create an incremental backup on delta tables? What I mean is to load into the backup storage only the latest parquet files part of the Delta Table and to refresh the _delta_log folder, instead of copying the whole files again and again.

The principle that I base this method on, is that when new data is added into the Delta Table, a new parquet file is added. So it should be possible to copy only those new files. Is it possible that a parquet file to be changed after its creation?

I am curious if someone else tried and if you think that this is a valid idea and how it would compare with Deep Clone in regards to speed and resource spent ?

1 ACCEPTED SOLUTION

Accepted Solutions

jose_gonzalez
Moderator
Moderator

Hi @Stefan Stegaru​ ,

You can use Delta time travel to query the data that was just added on a specific version. Then like @Hubert Dudek​  mentioned, you can copy over this sub set of data to a new table or a new location. You will need to do a deep clone to copy over the data from the source. Docs here

View solution in original post

3 REPLIES 3

-werners-
Esteemed Contributor III

I'm gonna answer this with a question 🙂

How are you going to rebuild the latest state of the delta lake table?

Hubert-Dudek
Esteemed Contributor III
  • copy your delta to new location (best adsl/blobstorage in other region)
CREATE OR REPLACE TABLE shared_table CLONE my_prod_table;

  • vacuum all history in new location
%sql
VACUUM delta.`<path-to-table>` RETAIN 0 HOURS
  • remove <path-to-table>/_delta_log in new location

jose_gonzalez
Moderator
Moderator

Hi @Stefan Stegaru​ ,

You can use Delta time travel to query the data that was just added on a specific version. Then like @Hubert Dudek​  mentioned, you can copy over this sub set of data to a new table or a new location. You will need to do a deep clone to copy over the data from the source. Docs here

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.