cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Unable to Create Cluster in ADW Deployment — CONTROL_PLANE_REQUEST_FAILURE

eshwari
New Contributor II

I'm running into an issue with my Databricks workspace in Azure in my own VNet. I've successfully created two private endpoints: databricks_ui_api and browser_authentication

However, when I try to create a cluster, I get the following error:

CONTROL_PLANE_REQUEST_FAILURE: Network health check reported that instance is unable to reach Databricks Control Plane. Please check that instances have connectivity to the Databricks Control Plane. Instance bootstrap inferred timeout reason: NetworkHealthCheck_CP_Failed

I've verified that the private endpoints are deployed and DNS resolution seems fine. But the cluster still fails to start due to what looks like a control plane connectivity issue.

Has anyone faced this before?

  • Are there additional endpoints or NSG rules I might be missing?
  • Is there a way to validate control plane connectivity from the workspace?
  • Any tips on debugging this in a private VNet setup?

Thanks in advance for any guidance!

1 ACCEPTED SOLUTION

Accepted Solutions

szymon_dybczak
Esteemed Contributor III

Hi @eshwari ,

Unfortunately, it's not enough. You should still have UDR defined, that will allow outbound access to following resources.

szymon_dybczak_0-1754903981361.png

Alternatively, if all your traffic from databricks subnets goes through Azure Firewall then you need to whitelist IPs of above resources to make it work.

User-defined route settings for Azure Databricks - Azure Databricks | Microsoft Learn

 

 

 

View solution in original post

5 REPLIES 5

szymon_dybczak
Esteemed Contributor III

Hi @eshwari ,

Do you have some kind of firewall in your setup? It can block outbound traffic to control plane.
The easiest way to troubleshoot would be to deploy VM instance into one of the workspace subnets and perform regular troubleshooting steps like nc, ping, telnet etc.

At below page you will find ip addresses of control plane per region:

https://learn.microsoft.com/en-us/azure/databricks/resources/ip-domain-region#control-plane-ip-addre...

 

Then you can try to run following series of commands to check connectivity (adjust for your region):

 

# Verify access to the web application
nc -zv 40.118.174.12 443
nc -zv 20.42.129.160 443

# Verify access to the secure compute connectivity relay
nc -zv tunnel.westus.azuredatabricks.net 443

# Verify Artifact Blob storage access
nc -zv dbartifactsprodwestus.blob.core.windows.net 443
nc -zv arprodwestusa1.blob.core.windows.net 443
..
nc -zv arprodwestusa15.blob.core.windows.net 443
nc -zv dbartifactsprodwestus2.blob.core.windows.net 443

# Verify Metastore Database access
nc -zv consolidated-westus-prod-metastore.mysql.database.azure.com 3306
nc -zv consolidated-westus-prod-metastore-addl-1.mysql.database.azure.com 3306
nc -zv consolidated-westus-prod-metastore-addl-2.mysql.database.azure.com 3306
nc -zv consolidated-westus-prod-metastore-addl-3.mysql.database.azure.com 3306
nc -zv consolidated-westus2c2-prod-metastore-addl-1.mysql.database.azure.com 3306

# Verify Log Blob storage access
nc -zv dblogprodwestus.blob.core.windows.net 443

 

Thank you for your response.

Commands to check connectivity fails, and we do have firewall enabled. But I have created private endpoints for databricks, wouldn't that be enough. Do I need to allow outbound internet traffic to region specific IPs explicitly?

szymon_dybczak
Esteemed Contributor III

Hi @eshwari ,

Unfortunately, it's not enough. You should still have UDR defined, that will allow outbound access to following resources.

szymon_dybczak_0-1754903981361.png

Alternatively, if all your traffic from databricks subnets goes through Azure Firewall then you need to whitelist IPs of above resources to make it work.

User-defined route settings for Azure Databricks - Azure Databricks | Microsoft Learn

 

 

 

Khaja_Zaffer
Contributor

Hello @eshwari 

If your Databricks cluster fails to start with the error message "Cluster terminated. Reason: Control Plane Request Failure... Failed to get instance bootstrap steps from the Databricks Control Plane," it's a clear indication that the worker nodes of the cluster can't communicate with the Databricks Control Plane. The control plane is the backend service that manages the cluster's lifecycle, handles job scheduling, and serves as the web application interface.

This is recommended: 

Configuring User-Defined Routes (UDRs) with the AzureDatabricks service tag and a next hop type of Internet is the recommended method for setting up network routing for Azure Databricks. This approach eliminates the need for manual updates to your route tables, ensuring your Databricks clusters can always communicate with essential backend services.

 

Why It's Recommended

 

  • Automatic Updates: The AzureDatabricks service tag automatically includes all necessary IP address ranges for Databricks Control Planes, web apps, and Secured Cluster Connectivity (SCC) relays in your region. Azure handles the management of these IP addresses, so you don't have to manually track and update your route tables as new services are added or IPs change.

  • Simplified Management: By using this single service tag, you create a robust and future-proof routing solution with minimal configuration. This prevents cluster failures that might otherwise occur due to outdated network rules.

  • Guaranteed Connectivity: With the UDR directing traffic to the Internet, you ensure that all outbound traffic from your Databricks cluster to the services represented by the AzureDatabricks tag is correctly routed, maintaining reliable operation.

    Khaja_Zaffer_0-1754730082717.png

     




Khaja_Zaffer
Contributor

Hello @eshwari 

Good day, I think  szymon_dybczak and  me provided enough information, please let me know if you had the solutions.

If you found the solution useful, you can select the solution for the solution which helps others. 🙂 

Have a greate day!

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now