Hi @jorgemarmol, There could be several reasons why the pipeline cluster is unreachable.
Some of the possible reasons include:
1. Infrastructural issues: The dev clusters are occasionally down due to infrastructural issues. Infra changes are often tested in dev clusters first, which can cause instability and make them unavailable. In such cases, deploys to those clusters will fail, but users can ignore those failures if they have a release that will succeed in staging despite the infra issues in dev.
2. Self Bootstrap Failure or Npip Tunnel Setup Failure: This error indicates that Bootstrap failed due to network connectivity issues between the data and control planes. More specifically, Databricks secure cluster connectivity (SCC) relay is not reachable, or files cannot be downloaded from the artefact bucket due to DNS resolution issues
3. Spark driver unreachable: This issue can be caused by invalid Spark configurations or malfunctioning init scripts. Also, it can occur if there is no NAT Gateway or egress path to Databricks control plane service available from customer VPC or if the customer VPC has an egress VPC Firewall rule blocking traffic to Control Plane
To troubleshoot these issues, you can download the system log, search "FAILED_MESSAGE", and use a Base64 decode tool to decode the message. The Network Diagnostic Tool can also be used to perform a detailed validation. It checks the network latency, SSL and application layer connectivity, and DNS resolution on the published and specific endpoints. Run the tool in standalone mode, as described in the README bundled with the tool.