cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Archival Strategy for Delta tables

Phani1
Valued Contributor II

 

Hi Team, We would like to define the archival strategy for data. Could you please share best practices /guide me on the below are the 3 use cases 

Case-1: On-Prem SQL and Oracle Data which is more than 20 years and they wanted to bring them into cloud space. The purpose of this data is for auditing. Very rarely used.

Case-2:  On-going data ingestion that needs to be archived after a certain defined interval of time.

Case-3: Archive related data from Gold/Silver/Bronze. That is Identify data from the Gold layer that needs to be archived and then archive all the related data from Silver and Bronze

Regards,

Phanindra

 

1 REPLY 1

-werners-
Esteemed Contributor III

case 1: I'd extract the data from the db to a data lake (cold storage if that is possible, that is cheaper) using an ETL tool like Data Factory, Glue etc.  Then the archiving can take place.  Perhaps also create a backup of the data on a 2nd data lake.

case 2:  Ideally you know what will be archived.  With that knowledge, you can add an 'archived' column which is True for all the records in the archived data, also create an epoch column which contains an archiving run ID (like the year if you run yearly; or yearmonth etc).

If you do not have a way to know what is archived/will be archived, you will have to do a crosscheck with the PK of the table AFTER the archiving.  Records not present in the source table are archived.  This is of course only true if no records are removed from the table besides the archiving.

case3: I would not do that.  What if you need your bronze/silver data for a new setup, and you need all data?
If you want to archive bronze/silver, you can.  But I would do that independent of the gold layer. (so basically case 2).

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group