3 weeks ago
I'm running into an issue with my Databricks workspace in Azure in my own VNet. I've successfully created two private endpoints: databricks_ui_api and browser_authentication
However, when I try to create a cluster, I get the following error:
CONTROL_PLANE_REQUEST_FAILURE: Network health check reported that instance is unable to reach Databricks Control Plane. Please check that instances have connectivity to the Databricks Control Plane. Instance bootstrap inferred timeout reason: NetworkHealthCheck_CP_Failed
I've verified that the private endpoints are deployed and DNS resolution seems fine. But the cluster still fails to start due to what looks like a control plane connectivity issue.
Has anyone faced this before?
Thanks in advance for any guidance!
3 weeks ago
Hi @eshwari ,
Unfortunately, it's not enough. You should still have UDR defined, that will allow outbound access to following resources.
Alternatively, if all your traffic from databricks subnets goes through Azure Firewall then you need to whitelist IPs of above resources to make it work.
User-defined route settings for Azure Databricks - Azure Databricks | Microsoft Learn
3 weeks ago
Hi @eshwari ,
Do you have some kind of firewall in your setup? It can block outbound traffic to control plane.
The easiest way to troubleshoot would be to deploy VM instance into one of the workspace subnets and perform regular troubleshooting steps like nc, ping, telnet etc.
At below page you will find ip addresses of control plane per region:
https://learn.microsoft.com/en-us/azure/databricks/resources/ip-domain-region#control-plane-ip-addre...
Then you can try to run following series of commands to check connectivity (adjust for your region):
# Verify access to the web application
nc -zv 40.118.174.12 443
nc -zv 20.42.129.160 443
# Verify access to the secure compute connectivity relay
nc -zv tunnel.westus.azuredatabricks.net 443
# Verify Artifact Blob storage access
nc -zv dbartifactsprodwestus.blob.core.windows.net 443
nc -zv arprodwestusa1.blob.core.windows.net 443
..
nc -zv arprodwestusa15.blob.core.windows.net 443
nc -zv dbartifactsprodwestus2.blob.core.windows.net 443
# Verify Metastore Database access
nc -zv consolidated-westus-prod-metastore.mysql.database.azure.com 3306
nc -zv consolidated-westus-prod-metastore-addl-1.mysql.database.azure.com 3306
nc -zv consolidated-westus-prod-metastore-addl-2.mysql.database.azure.com 3306
nc -zv consolidated-westus-prod-metastore-addl-3.mysql.database.azure.com 3306
nc -zv consolidated-westus2c2-prod-metastore-addl-1.mysql.database.azure.com 3306
# Verify Log Blob storage access
nc -zv dblogprodwestus.blob.core.windows.net 443
3 weeks ago
Thank you for your response.
Commands to check connectivity fails, and we do have firewall enabled. But I have created private endpoints for databricks, wouldn't that be enough. Do I need to allow outbound internet traffic to region specific IPs explicitly?
3 weeks ago
Hi @eshwari ,
Unfortunately, it's not enough. You should still have UDR defined, that will allow outbound access to following resources.
Alternatively, if all your traffic from databricks subnets goes through Azure Firewall then you need to whitelist IPs of above resources to make it work.
User-defined route settings for Azure Databricks - Azure Databricks | Microsoft Learn
3 weeks ago
Hello @eshwari
If your Databricks cluster fails to start with the error message "Cluster terminated. Reason: Control Plane Request Failure... Failed to get instance bootstrap steps from the Databricks Control Plane," it's a clear indication that the worker nodes of the cluster can't communicate with the Databricks Control Plane. The control plane is the backend service that manages the cluster's lifecycle, handles job scheduling, and serves as the web application interface.
This is recommended:
Configuring User-Defined Routes (UDRs) with the AzureDatabricks service tag and a next hop type of Internet is the recommended method for setting up network routing for Azure Databricks. This approach eliminates the need for manual updates to your route tables, ensuring your Databricks clusters can always communicate with essential backend services.
Automatic Updates: The AzureDatabricks service tag automatically includes all necessary IP address ranges for Databricks Control Planes, web apps, and Secured Cluster Connectivity (SCC) relays in your region. Azure handles the management of these IP addresses, so you don't have to manually track and update your route tables as new services are added or IPs change.
Simplified Management: By using this single service tag, you create a robust and future-proof routing solution with minimal configuration. This prevents cluster failures that might otherwise occur due to outdated network rules.
Guaranteed Connectivity: With the UDR directing traffic to the Internet, you ensure that all outbound traffic from your Databricks cluster to the services represented by the AzureDatabricks tag is correctly routed, maintaining reliable operation.
a week ago
Hello @eshwari
Good day, I think szymon_dybczak and me provided enough information, please let me know if you had the solutions.
If you found the solution useful, you can select the solution for the solution which helps others. 🙂
Have a greate day!
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now