cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Archival Strategy for Delta tables

Phani1
Valued Contributor

 

Hi Team, We would like to define the archival strategy for data. Could you please share best practices /guide me on the below are the 3 use cases 

Case-1: On-Prem SQL and Oracle Data which is more than 20 years and they wanted to bring them into cloud space. The purpose of this data is for auditing. Very rarely used.

Case-2:  On-going data ingestion that needs to be archived after a certain defined interval of time.

Case-3: Archive related data from Gold/Silver/Bronze. That is Identify data from the Gold layer that needs to be archived and then archive all the related data from Silver and Bronze

Regards,

Phanindra

 

1 REPLY 1

-werners-
Esteemed Contributor III

case 1: I'd extract the data from the db to a data lake (cold storage if that is possible, that is cheaper) using an ETL tool like Data Factory, Glue etc.  Then the archiving can take place.  Perhaps also create a backup of the data on a 2nd data lake.

case 2:  Ideally you know what will be archived.  With that knowledge, you can add an 'archived' column which is True for all the records in the archived data, also create an epoch column which contains an archiving run ID (like the year if you run yearly; or yearmonth etc).

If you do not have a way to know what is archived/will be archived, you will have to do a crosscheck with the PK of the table AFTER the archiving.  Records not present in the source table are archived.  This is of course only true if no records are removed from the table besides the archiving.

case3: I would not do that.  What if you need your bronze/silver data for a new setup, and you need all data?
If you want to archive bronze/silver, you can.  But I would do that independent of the gold layer. (so basically case 2).

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!