Procedure of retrieving archived data from delta table
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-05-2025 08:11 PM
Hi all,
I am currently researching on the archive support features in Databricks. https://docs.databricks.com/en/optimizations/archive-delta.html
Let say I have enabled archive support and configured the data to be archived after 5 years and I also configured lifecycle management policy to move data file to archive tier after 5 years.
I would like to know the procedure of retrieving those archive data. As per my understanding, I should move the corresponding data files from archive tier to hot tier on storage side first.
May I know what should I do on Databricks side if I want to retrieve the data 1) before 7 years and 2) from the very beginning?
Highly appreciate if someone can help me out with this question. Thanks in advance.
- Labels:
-
Delta Lake
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-06-2025 04:06 AM
If you want to retrieve data before 7 years, ensure that the delta.timeUntilArchived
property is set to a value that reflects the archival policy (e.g., 5 years).
Restore the necessary files using the SHOW ARCHIVED FILES
command and follow the cloud provider's instructions to move the files back to the hot tier.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-06-2025 08:02 PM
@Walter_C Thank you for your reply. However, there are some part that might need your further clarification.
Assume I already set the delta.timeUntilArchived to 1825days (5years) and I have configured the lifecycle policy align with databricks setting which move files after 5years creation to archive tier on storage side.
After a while, I have a requirement to retrieve data before 7years. I expect there are part of data moved to archive tier that need to be restored. Should I change the delta.timeUntilArchived to 2555days (7years) or just keep it as is which is 1825days (5years)?
Also, I would like to confirm whether the procedure of restoring archive data as per my understanding is correct or not, below are what I thought:
Step 1: Run SHOW ARCHIVED FILES to check what data files need to be move back to hot tier
Step 2: Move the files back to hot tier on storage side
Step 3: Update the delta.timeUntilArchived setting to 2555days (7years) on Databricks side
I assume the procedure should be the same for both case 1) before 7years and 2) whole time, right?
Please kindly correct me if there is any misunderstanding. Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-08-2025 07:07 AM
To retrieve data before 7 years, you do not need to change the delta.timeUntilArchived
setting from 1825 days (5 years) to 2555 days (7 years). You can keep it as is. The procedure for restoring archived data is as follows:
-
Run
SHOW ARCHIVED FILES
: Use theSHOW ARCHIVED FILES
command to identify the files that need to be moved back to the hot tier. The syntax is:SHOW ARCHIVED FILES FOR <table_name> [ WHERE <predicate> ];
This operation returns URIs for archived files as a Spark DataFrame.
-
Move Files Back to Hot Tier on Storage Side: Restore the necessary archived files following documented instructions from your object storage provider.
-
Update
delta.timeUntilArchived
Setting: If you need to access data older than the current archival threshold, update thedelta.timeUntilArchived
setting to the new value (e.g., 2555 days for 7 years). This step ensures that Databricks recognizes the restored files as part of the active dataset.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-08-2025 05:53 PM
@Walter_C Thank you for the reply. However it is a bit confusing. In the beginning of your reply, you said I do not need to change the delta.timeUntilArchived setting but in step 3 you said I have to update the setting.
Do you mean I should not change delta.timeUntilArchived before moving the file to hot tier but after I move them to hot tier I need to change the setting in order to query the restored data?
Could you please elaborate more? Thank you very much.

