3 weeks ago
I am trying to start a cluster in Az databricks,our policy is to use proxy for outbound traffic. I have configured http_proxy, https_proxy, HTTP_PROXY, HTTPS_PROXY, no_proxy and NO_PROXY List in env variables and global . Made sure proxy is bypassing SSL inception.
when we start cluster it fails with following message
X_NHC_MULTIPLE_COMPONENTS_FAILURE: Instance failed network health check before bootstrapping with fatal error: X_NHC_MULTIPLE_COMPONENTS_FAILURE 3 failed component(s): control_plane internet storage Retryable: false Based on the failure results: List(entity: "arprodeastusa9.blob.core.windows.net" outcome: "unreachable" duration_sec:
When verified its not reaching proxy and firewall is blocking how can this be fixed?
2 weeks ago
Hello,
Could you please clarify if you have also enabled Private Link connectivity?
Wednesday
Hi Yes Private link connectivity is enabled.
Wednesday
Error Message :
{
"reason": {
"code": "NETWORK_CHECK_STORAGE_FAILURE",
"type": "CLOUD_FAILURE",
"parameters": {
"databricks_error_message": " [details] X_NHC_STORAGE_UNREACHABLE: Instance failed network health check before bootstrapping with fatal error: X_NHC_STORAGE_UNREACHABLE\n2 failed component(s): internet storage\nRetryable: true\nBased on the failure results: List(entity: \"arprodeastusa2.blob.core.windows.net\"\noutcome: \"unreachable\"\nduration_sec: 227.27078\nmessage: \"curl: (28) Operation timed out after 10000 milliseconds with 0 bytes received\"\nlast_error_code: 28\n, entity: \"arprodeastusa10.blob.core.windows.net\"\noutcome: \"unreachable\"\nduration_sec: 227.26895\nmessage: \"curl: (28) Operation timed out after 10000 milliseconds with 0 bytes received\"\nlast_error_code: 28\n, entity: \"dbartifactsprodeastus.blob.core.windows.net\"\noutcome: \"unreachable\"\nduration_sec: 227.26909\nmessage: \"curl: (28) Operation timed out after 10000 milliseconds with 0 bytes received\"\nlast_error_code: 28\n, entity: \"arprodeastus2a7.blob.core.windows.net\"\noutcome: \"unreachable\"\nduration_sec: 227.26433\nmessage: \"curl: (28) Operation timed out after 10000 milliseconds with 0 bytes received\"\nlast_error_code: 28\n, entity: \"arprodeastus2a4.blob.core.windows.net\"\noutcome: \"unreachable\"\nduration_sec: 227.26439\nmessage: \"curl: (28) Operation timed out after 10000 milliseconds with 0 bytes received\"\nlast_error_code: 28\n, entity: \"www.databricks.com\"\noutcome: \"ssl_error\"\nduration_sec: 226.52838\nmessage: \"curl: (35) OpenSSL SSL_connect: Connection reset by peer in connection to www.databricks.com:443\"\nlast_error_code: 35\n)(OnDemand)",
"instance_id": "REDACTED_FOR_HIDDEN",
"azure_error_code": "X_NHC_STORAGE_UNREACHABLE",
"azure_error_message": "Instance failed network health check before bootstrapping with fatal error: X_NHC_STORAGE_UNREACHABLE\n2 failed component(s): internet storage\nRetryable: true\nBased on the failure results: List(entity: \"arprodeastusa2.blob.core.windows.net\"\noutcome: \"unreachable\"\nduration_sec: 227.27078\nmessage: \"curl: (28) Operation timed out after 10000 milliseconds with 0 bytes received\"\nlast_error_code: 28\n, entity: \"arprodeastusa10.blob.core.windows.net\"\noutcome: \"unreachable\"\nduration_sec: 227.26895\nmessage: \"curl: (28) Operation timed out after 10000 milliseconds with 0 bytes received\"\nlast_error_code: 28\n, entity: \"dbartifactsprodeastus.blob.core.windows.net\"\noutcome: \"unreachable\"\nduration_sec: 227.26909\nmessage: \"curl: (28) Operation timed out after 10000 milliseconds with 0 bytes received\"\nlast_error_code: 28\n, entity: \"arprodeastus2a7.blob.core.windows.net\"\noutcome: \"unreachable\"\nduration_sec: 227.26433\nmessage: \"curl: (28) Operation timed out after 10000 milliseconds with 0 bytes received\"\nlast_error_code: 28\n, entity: \"arprodeastus2a4.blob.core.windows.net\"\noutcome: \"unreachable\"\nduration_sec: 227.26439\nmessage: \"curl: (28) Operation timed out after 10000 milliseconds with 0 bytes received\"\nlast_error_code: 28\n, entity: \"www.databricks.com\"\noutcome: \"ssl_error\"\nduration_sec: 226.52838\nmessage: \"curl: (35) OpenSSL SSL_connect: Connection reset by peer in connection to www.databricks.com:443\"\nlast_error_code: 35\n)(OnDemand)"
}
},
"add_node_failure_details": {
"failure_count": 3,
"resource_type": "container",
"will_retry": true
}
}
Proxy was enabled via inti_scripts and it uploaded into volumes, on Zscaler, SSL_BYPASS was enabled, for whole subnet IP/CIDR range. also blob storage accounts have been allowed. Traffic is hitting firewall and dropped.
Friday
Thank you for the details.
Kindly confirm if you have allowed Artifact Blob storage primary and Artifact Blob storage secondary for EastUS in your firewall. Please review this page for the FQDNs - https://learn.microsoft.com/en-us/azure/databricks/resources/ip-domain-region
Wednesday
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now