- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-16-2021 01:40 AM
Hello,
Does anyone tried to create an incremental backup on delta tables? What I mean is to load into the backup storage only the latest parquet files part of the Delta Table and to refresh the _delta_log folder, instead of copying the whole files again and again.
The principle that I base this method on, is that when new data is added into the Delta Table, a new parquet file is added. So it should be possible to copy only those new files. Is it possible that a parquet file to be changed after its creation?
I am curious if someone else tried and if you think that this is a valid idea and how it would compare with Deep Clone in regards to speed and resource spent ?
- Labels:
-
Backup
-
Delta
-
Delta table
-
Delta Tables
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-17-2021 11:47 AM
Hi @Stefan Stegaru ,
You can use Delta time travel to query the data that was just added on a specific version. Then like @Hubert Dudek mentioned, you can copy over this sub set of data to a new table or a new location. You will need to do a deep clone to copy over the data from the source. Docs here
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-16-2021 02:23 AM
I'm gonna answer this with a question 🙂
How are you going to rebuild the latest state of the delta lake table?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-16-2021 03:31 AM
- copy your delta to new location (best adsl/blobstorage in other region)
CREATE OR REPLACE TABLE shared_table CLONE my_prod_table;
- vacuum all history in new location
%sql
VACUUM delta.`<path-to-table>` RETAIN 0 HOURS
- remove <path-to-table>/_delta_log in new location
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-17-2021 11:47 AM
Hi @Stefan Stegaru ,
You can use Delta time travel to query the data that was just added on a specific version. Then like @Hubert Dudek mentioned, you can copy over this sub set of data to a new table or a new location. You will need to do a deep clone to copy over the data from the source. Docs here

