cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Databricks platform setup best practices in Azure

Phani1
Valued Contributor

Could you please help me on the prerequisites, configuring steps and best practices for the data bricks platform and workspace setup in Azure(include  subscription, subnet /network configuration ,storage configuration best practices etc).

2 REPLIES 2

Slash
New Contributor III

Sure, let's consider it step by step:

- Subscription - you can follow setup that is recommended by Microsoft (link below). You should have one central "hub" subscription", it serves as the central point of management and connectivity for the spokes. It typically contains network services such as virtual network gateways, Azure Firewall, and Azure Application Gateway. These services are shared among the spokes, reducing the need for duplicated resources and improving management and security.
In Data Landing Zone subscription will be your Databricks Workspace, and VNet in this subscription should be peered to hub VNet

Hub-spoke network topology in Azure - Azure Architecture Center | Microsoft Learn
Data landing zones - Cloud Adoption Framework | Microsoft Learn


- Networking - most secure and recommended approch is to deploy Databricks to your own VNet (Deploy Azure Databricks in your Azure virtual network (VNet injection) - Azure Databricks | Microsof...)
and turn on Secure Connectivity Cluster, which means driver and workers will not have public IP (Secure cluster connectivity - Azure Databricks | Microsoft Learn).
Then you can also configure Private Link, to enable private connectivity between users and their Databricks workspaces, and also between clusters on the classic compute plane and the core services on the control plane within the Databricks workspace infrastructure

Subnets - if you decide to choose deployment to your own VNet, you will have to create 2 subnets:
- container subnet  (for driver)

- host subnet (for workers)

In below documentation are placed microsoft recommendation for cluster sized 
Deploy Azure Databricks in your Azure virtual network (VNet injection) - Azure Databricks | Microsof...
Deploy Azure Databricks in your Azure virtual network (VNet injection) - Azure Databricks | Microsof...

Storage - Store data in dedicated ADLS gen2 account. Best practice is to use unity catalog with managed tables. If you need external table, you can setup external location. Follow below guide, how to achieve this using Unity Catalog.

Connect to cloud object storage using Unity Catalog - Azure Databricks | Microsoft Learn

Rishabh_Tiwari
Community Manager
Community Manager

Hi @Phani1 ,

Thank you for reaching out to our community! We're here to help you. 

To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedback not only helps us assist you better but also benefits other community members who may have similar questions in the future.

If you found the answer helpful, consider giving it a kudo. If the response fully addresses your question, please mark it as the accepted solution. This will help us close the thread and ensure your question is resolved.

We appreciate your participation and are here to assist you further if you need it!

Thanks,

Rishabh

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!