cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

What is the best strategy for backing up a large Databricks Delta table that is stored in Azure blob storage?

deisou
New Contributor

I have a large delta table that I would like to back up and I am wondering what is the best practice for backing it up.

The goal is so that if there is any accidental corruption or data loss either at the Azure blob storage level or within Databricks itself I can restore the data.

Is using the Azure blob "Point-in-time" restore features appropriate? On paper, it sounds like it has all the features I require. However, what is the downstream effect of using it on a delta table and will weekly OPTIMIZE cause rewrites of the data and blow out the costs?

In other Azure/Databricks documentation, there was mention of using Deep Clone for data replication.

Any thoughts appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions

AmanSehgal
Honored Contributor III

Deep Clone should do a good job for taking back up of delta tables.

View solution in original post

4 REPLIES 4

AmanSehgal
Honored Contributor III

Deep Clone should do a good job for taking back up of delta tables.

Hubert-Dudek
Esteemed Contributor III

You can also set some copy process in Azure Data Factory

-werners-
Esteemed Contributor III

big advantage of file based storage (compared to rdmbs): copy/paste 🙂

Anonymous
Not applicable

Hi @deisou​ 

Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark the answer as best? If not, please tell us so we can help you.

Cheers!