cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Azure Databricks Control Plane connectivity issue after migrating to vWAN

nodeb
New Contributor

Hello everyone,

Recently, I received a client request to migrate our Azure Databricks environment from a Hub-and-Spoke architecture to a vWAN Hub architecture with an NVA (Network Virtual Appliance).

Hereโ€™s a quick overview of the setup:

  • The Databricks workspace is VNet-injected.

  • Private Endpoints are configured for all required services.

  • Two subnets are in use: Public Host and Private Host.

  • The routing intent on the vWAN hub is configured to send all traffic through the NVA.

  • Storage accounts and DNS resolution (Private Link) work correctly โ€” verified through a VM on the same VNet.

  • The issue affects only the Databricks Control Plane, which cannot communicate with the cluster/compute plane.

Error Message:

Failed to add 1 worker to the compute. Will attempt retry: true.
Reason: Control Plane Request Failure Due To Misconfig

CONTROL_PLANE_REQUEST_FAILURE:
Network health check reported that instance is unable to reach Databricks Control Plane.
Please check that instances have connectivity to the Databricks Control Plane.
Instance bootstrap inferred timeout reason: NetworkHealthCheck_CP_Failed

Failure message (Base64 encoded):
dW5yZWFjaGFibGUgY3VybDogKDI4KSBSZXNvbHZpbmcgdGltZWQgb3V0IGFmdGVyIDEwMDAwIG1pbGxpc2Vjb25kcw==

VM extension code: ProvisioningState/succeeded
InstanceId: 3fc5930e53d94adb80120a420bae2724
WorkerEnv: workerenv-85992446950252
NetworkHealthCheck finished with exit code 125.

 

Troubleshooting done so far:

  • Verified NSG rules on both host subnets (allowing outbound 443).

  • Confirmed Private Endpoints are resolving correctly.

  • Checked that routing intent is sending outbound traffic via NVA as expected.

  • Validated that the same setup works in our previous Hub-and-Spoke model.

It seems that when using secured vWAN hubs with routing intent, the control plane traffic might not be reaching Databricks public endpoints.

Has anyone experienced similar issues or found a way to route control plane traffic properly through vWAN (or bypass it when needed)?

Any guidance or best practices for Databricks + vWAN + NVA setups would be appreciated.

Thanks,

1 REPLY 1

nodeb
New Contributor

The problem is fixed.