cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Procedure of retrieving archived data from delta table

Brianben
New Contributor III

Hi all,

I am currently researching on the archive support features in Databricks. https://docs.databricks.com/en/optimizations/archive-delta.html

Let say I have enabled archive support and configured the data to be archived after 5 years and I also configured lifecycle management policy to move data file to archive tier after 5 years.

I would like to know the procedure of retrieving those archive data. As per my understanding, I should move the corresponding data files from archive tier to hot tier on storage side first.

May I know what should I do on Databricks side if I want to retrieve the data 1) before 7 years and 2) from the very beginning?

Highly appreciate if someone can help me out with this question. Thanks in advance. 

 

 

2 REPLIES 2

Walter_C
Databricks Employee
Databricks Employee

 

If you want to retrieve data before 7 years, ensure that the delta.timeUntilArchived property is set to a value that reflects the archival policy (e.g., 5 years).
Restore the necessary files using the SHOW ARCHIVED FILES command and follow the cloud provider's instructions to move the files back to the hot tier.

 

Brianben
New Contributor III

@Walter_C Thank you for your reply. However, there are some part that might need your further clarification.

Assume I already set the delta.timeUntilArchived to 1825days (5years) and I have configured the lifecycle policy align with databricks setting which move files after 5years creation to archive tier on storage side.

After a while, I have a requirement to retrieve data before 7years. I expect there are part of data moved to archive tier that need to be restored. Should I change the delta.timeUntilArchived to 2555days (7years) or just keep it as is which is 1825days (5years)?

Also, I would like to confirm whether the procedure of restoring archive data as per my understanding is correct or not, below are what I thought:

Step 1: Run SHOW ARCHIVED FILES to check what data files need to be move back to hot tier

Step 2: Move the files back to hot tier on storage side

Step 3: Update the delta.timeUntilArchived setting to 2555days (7years) on Databricks side

I assume the procedure should be the same for both case 1) before 7years and 2) whole time, right?

Please kindly correct me if there is any misunderstanding. Thank you.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group