cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Databricks platform setup best practices in Azure

Phani1
Valued Contributor II

Could you please help me on the prerequisites, configuring steps and best practices for the data bricks platform and workspace setup in Azure(include  subscription, subnet /network configuration ,storage configuration best practices etc).

2 REPLIES 2

Slash
Contributor

Sure, let's consider it step by step:

- Subscription - you can follow setup that is recommended by Microsoft (link below). You should have one central "hub" subscription", it serves as the central point of management and connectivity for the spokes. It typically contains network services such as virtual network gateways, Azure Firewall, and Azure Application Gateway. These services are shared among the spokes, reducing the need for duplicated resources and improving management and security.
In Data Landing Zone subscription will be your Databricks Workspace, and VNet in this subscription should be peered to hub VNet

Hub-spoke network topology in Azure - Azure Architecture Center | Microsoft Learn
Data landing zones - Cloud Adoption Framework | Microsoft Learn


- Networking - most secure and recommended approch is to deploy Databricks to your own VNet (Deploy Azure Databricks in your Azure virtual network (VNet injection) - Azure Databricks | Microsof...)
and turn on Secure Connectivity Cluster, which means driver and workers will not have public IP (Secure cluster connectivity - Azure Databricks | Microsoft Learn).
Then you can also configure Private Link, to enable private connectivity between users and their Databricks workspaces, and also between clusters on the classic compute plane and the core services on the control plane within the Databricks workspace infrastructure

Subnets - if you decide to choose deployment to your own VNet, you will have to create 2 subnets:
- container subnet  (for driver)

- host subnet (for workers)

In below documentation are placed microsoft recommendation for cluster sized 
Deploy Azure Databricks in your Azure virtual network (VNet injection) - Azure Databricks | Microsof...
Deploy Azure Databricks in your Azure virtual network (VNet injection) - Azure Databricks | Microsof...

Storage - Store data in dedicated ADLS gen2 account. Best practice is to use unity catalog with managed tables. If you need external table, you can setup external location. Follow below guide, how to achieve this using Unity Catalog.

Connect to cloud object storage using Unity Catalog - Azure Databricks | Microsoft Learn

Rishabh_Tiwari
Community Manager
Community Manager

Hi @Phani1 ,

Thank you for reaching out to our community! We're here to help you. 

To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedback not only helps us assist you better but also benefits other community members who may have similar questions in the future.

If you found the answer helpful, consider giving it a kudo. If the response fully addresses your question, please mark it as the accepted solution. This will help us close the thread and ensure your question is resolved.

We appreciate your participation and are here to assist you further if you need it!

Thanks,

Rishabh

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group