11-16-2021 01:40 AM
Hello,
Does anyone tried to create an incremental backup on delta tables? What I mean is to load into the backup storage only the latest parquet files part of the Delta Table and to refresh the _delta_log folder, instead of copying the whole files again and again.
The principle that I base this method on, is that when new data is added into the Delta Table, a new parquet file is added. So it should be possible to copy only those new files. Is it possible that a parquet file to be changed after its creation?
I am curious if someone else tried and if you think that this is a valid idea and how it would compare with Deep Clone in regards to speed and resource spent ?
11-17-2021 11:47 AM
Hi @Stefan Stegaru ,
You can use Delta time travel to query the data that was just added on a specific version. Then like @Hubert Dudek mentioned, you can copy over this sub set of data to a new table or a new location. You will need to do a deep clone to copy over the data from the source. Docs here
11-16-2021 02:23 AM
I'm gonna answer this with a question 🙂
How are you going to rebuild the latest state of the delta lake table?
11-16-2021 03:31 AM
CREATE OR REPLACE TABLE shared_table CLONE my_prod_table;
%sql
VACUUM delta.`<path-to-table>` RETAIN 0 HOURS
11-17-2021 11:47 AM
Hi @Stefan Stegaru ,
You can use Delta time travel to query the data that was just added on a specific version. Then like @Hubert Dudek mentioned, you can copy over this sub set of data to a new table or a new location. You will need to do a deep clone to copy over the data from the source. Docs here
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.