Databricks Community

deisou · ‎02-28-2022

I have a large delta table that I would like to back up and I am wondering what is the best practice for backing it up.

The goal is so that if there is any accidental corruption or data loss either at the Azure blob storage level or within Databricks itself I can restore the data.

Is using the Azure blob "Point-in-time" restore features appropriate? On paper, it sounds like it has all the features I require. However, what is the downstream effect of using it on a delta table and will weekly OPTIMIZE cause rewrites of the data and blow out the costs?

In other Azure/Databricks documentation, there was mention of using Deep Clone for data replication.

Any thoughts appreciated.

AmanSehgal · ‎02-28-2022

Deep Clone should do a good job for taking back up of delta tables.

View solution in original post

AmanSehgal · ‎02-28-2022

Deep Clone should do a good job for taking back up of delta tables.

Hubert-Dudek · ‎03-01-2022

You can also set some copy process in Azure Data Factory

-werners- · ‎03-01-2022

big advantage of file based storage (compared to rdmbs): copy/paste 🙂

Anonymous · ‎04-27-2022

Hi @deisou

Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark the answer as best? If not, please tell us so we can help you.

Cheers!

Databricks Community

What is the best strategy for backing up a large Databricks Delta table that is stored in Azure blob storage?

Connect with Databricks Users in Your Area

Databricks Learning Festival (Virtual): 15 January - 31 January 2025

Share Your Feedback in Our Community Survey

Databricks Named a Leader in the 2024 Gartner® Magic Quadrant™ for Cloud Database Management Systems

Milestone: DatabricksTV Reaches 100 Videos!

Announcing the new Meta Llama 3.3 model on Databricks