12-02-2024 09:21 AM
Hello Everyone
Currently, We are in process of building azure databricks and have some doubt regarding best practices to follow for azure storage account which we will be using to store data. Can anyone help me finding best practices to follow for storage account, specially in backup and recovery scenarios.
Thank you!
Dinesh Kumar
12-04-2024 03:04 PM
"It depends..."
Unfortunately, Databricks only has some general best practices, and will encourage you to connect with your account team for specific advice. If you haven't met your account team yet, you should try and find them, they're a great resource.
AS a cloud architect and Databricks administrator, here are some specific things I recommend for storage in Azure Databricks:
For some references:
12-03-2024 12:32 AM
Hi @Dnirmania ,
It appears that your question is primarily about best practices for Azure Storage accounts, specifically focusing on backup and recovery scenarios, rather than being directly related to Databricks. I recommend reviewing the following two Microsoft articles:
Azure Storage Redundancy: This article details the various redundancy options available in Azure Storage. Understanding these options will help you configure the appropriate level of data replication and resilience for your storage account, ensuring your data is stored securely and is highly available.
Azure Storage Disaster Recovery Guidance: This resource provides comprehensive guidance on planning and implementing a disaster recovery strategy for your Azure Storage account. It covers best practices for backup, recovery, and how to prepare for and execute a storage account failover.
12-03-2024 01:02 AM
Thanks @filipniziol for your suggestion. The articles you shared focus on general best practices and recommendations for storage accounts. What I'm looking for are Databricks-specific recommendations for configuring storage accounts.
12-03-2024 01:56 AM - edited 12-03-2024 02:17 AM
Hi @Dnirmania ,
Best practice is to configure storage with Unity Catalog:
Connect to cloud object storage and services using Unity Catalog - Azure Databricks | Microsoft Lear...
But in your question you're asking about backup and recovery scenarios at the storage level. Those should be handled via Azure native capabilities like Azure Backup. The same applies for DR for azure storage, you can read more at below documentation entry.
Databricks is just using object storage of given cloud provider to store data. Its your responsibility (or azure administrator) to plan backup and disaster recovery scenarios.
Azure storage disaster recovery planning and failover - Azure Storage | Microsoft Learn
12-04-2024 03:04 PM
"It depends..."
Unfortunately, Databricks only has some general best practices, and will encourage you to connect with your account team for specific advice. If you haven't met your account team yet, you should try and find them, they're a great resource.
AS a cloud architect and Databricks administrator, here are some specific things I recommend for storage in Azure Databricks:
For some references:
4 weeks ago
Thanks for sharing your knowledge with us. it will definitely help me and other data Engineers. Thanks once again 😊
a month ago
Thanks for sharing @Rjdudley @szymon_dybczak @filipniziol
4 weeks ago
To follow up, you can actually back up blobs: Overview of Azure Blobs backup - Azure Backup | Microsoft Learn, including to on-premises. Obviously on-premises capacity is a large question. I excluded this because I question what you would accomplish with the cloud backup option that wouldn't be better served by geo-replication, but in the interest of thoroughness I felt I had to mention this option. Up to your specific needs, though.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group