3 weeks ago
Hello Everyone
Currently, We are in process of building azure databricks and have some doubt regarding best practices to follow for azure storage account which we will be using to store data. Can anyone help me finding best practices to follow for storage account, specially in backup and recovery scenarios.
Thank you!
Dinesh Kumar
3 weeks ago
"It depends..."
Unfortunately, Databricks only has some general best practices, and will encourage you to connect with your account team for specific advice. If you haven't met your account team yet, you should try and find them, they're a great resource.
AS a cloud architect and Databricks administrator, here are some specific things I recommend for storage in Azure Databricks:
For some references:
3 weeks ago
Hi @Dnirmania ,
It appears that your question is primarily about best practices for Azure Storage accounts, specifically focusing on backup and recovery scenarios, rather than being directly related to Databricks. I recommend reviewing the following two Microsoft articles:
Azure Storage Redundancy: This article details the various redundancy options available in Azure Storage. Understanding these options will help you configure the appropriate level of data replication and resilience for your storage account, ensuring your data is stored securely and is highly available.
Azure Storage Disaster Recovery Guidance: This resource provides comprehensive guidance on planning and implementing a disaster recovery strategy for your Azure Storage account. It covers best practices for backup, recovery, and how to prepare for and execute a storage account failover.
3 weeks ago
Thanks @filipniziol for your suggestion. The articles you shared focus on general best practices and recommendations for storage accounts. What I'm looking for are Databricks-specific recommendations for configuring storage accounts.
3 weeks ago - last edited 3 weeks ago
Hi @Dnirmania ,
Best practice is to configure storage with Unity Catalog:
Connect to cloud object storage and services using Unity Catalog - Azure Databricks | Microsoft Lear...
But in your question you're asking about backup and recovery scenarios at the storage level. Those should be handled via Azure native capabilities like Azure Backup. The same applies for DR for azure storage, you can read more at below documentation entry.
Databricks is just using object storage of given cloud provider to store data. Its your responsibility (or azure administrator) to plan backup and disaster recovery scenarios.
Azure storage disaster recovery planning and failover - Azure Storage | Microsoft Learn
3 weeks ago
"It depends..."
Unfortunately, Databricks only has some general best practices, and will encourage you to connect with your account team for specific advice. If you haven't met your account team yet, you should try and find them, they're a great resource.
AS a cloud architect and Databricks administrator, here are some specific things I recommend for storage in Azure Databricks:
For some references:
2 weeks ago
Thanks for sharing your knowledge with us. it will definitely help me and other data Engineers. Thanks once again ๐
2 weeks ago
Thanks for sharing @Rjdudley @szymon_dybczak @filipniziol
2 weeks ago
To follow up, you can actually back up blobs: Overview of Azure Blobs backup - Azure Backup | Microsoft Learn, including to on-premises. Obviously on-premises capacity is a large question. I excluded this because I question what you would accomplish with the cloud backup option that wouldn't be better served by geo-replication, but in the interest of thoroughness I felt I had to mention this option. Up to your specific needs, though.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group