Databricks setup/deployment checklist/best practices
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-28-2023 04:35 AM
Hi Team, could you please share or guide us on any checklist/best practices for Databricks setup/deployment?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-28-2023 04:37 AM
The Databricks platform is on azure
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-28-2023 08:07 AM
Hi @Phani1 , here are some best practices https://github.com/Azure/AzureDatabricksBestPractices/tree/master and you could take these points as your "checklist".
Choose the right Databricks Workspace:
- Decide on the appropriate Azure region for your Databricks workspace.
- Consider using the Azure Portal to create the Databricks workspace or use infrastructure-as-code tools like ARM templates or Terraform.
Authentication and Authorization:
- Integrate Databricks with Azure Active Directory (Azure AD) for authentication.
- Implement role-based access control (RBAC) to manage authorization.
Networking and Security:
- Configure Azure Virtual Network (VNet) settings for Databricks clusters.
- Utilize Azure Network Security Groups (NSGs) for firewall rules.
- Consider Private Link for enhanced security.
Use Azure Key Vault for Secrets:
- Store sensitive information such as API keys, passwords, and tokens in Azure Key Vault.
- Integrate Databricks with Azure Key Vault for secure access to secrets.
Cluster Configuration:
- Leverage Azure Databricks Autoscaling for dynamic resource allocation.
- Integrate Databricks clusters with Azure Virtual Network for enhanced security.
Data Storage and Integration:
- Use Azure Data Lake Storage (ADLS) or Azure Blob Storage for data storage.
Logging and Monitoring:
- Configure Azure Monitor for logging and monitoring.
- Utilize Azure Log Analytics for centralized log storage and analysis.
Azure Databricks Jobs:
- Schedule jobs using Azure Databricks Jobs for automated execution.
- Use Azure Data Factory for orchestrating ETL workflows if needed.
Azure Databricks Delta Lake:
- Consider using Delta Lake for efficient storage, management, and processing of big data.
- Utilize Delta Lake for ACID transactions and schema evolution.
Azure DevOps Integration:
- Integrate Databricks with Azure DevOps for continuous integration and deployment.
- Automate deployments using Azure DevOps pipelines.

