Error on GCE job cluster after upgrading from GKE
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-23-2024 09:05 AM
We upgraded the GKE cluster to GCE cluster as per the databricks documentation. It works fine on GCE all purpose cluster but gives error while trying to access the databricks managed secret on GCE job cluster. Job is being executed using service principal and it has all the permission as it was working fine on GKE job cluster. Here is the error trace,
: org.apache.http.conn.HttpHostConnectException: Connect to europe-west3.gcp.databricks.com:443 [europe-west3.gcp.databricks.com/34.159.208.230] failed: Connection timed out (Connection timed out) File <command-1511365916692010>, line 1 ----> 1 mongo_prd_user = dbutils.secrets.get(scope="<scope_name>", key="prd_user") 2 mongo_prd_password = dbutils.secrets.get(scope="<scope_name>",key="prd_password")
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-23-2024 09:13 AM
Hi @Sadam97,
if your account uses a customer-managed VPC, you need to manually add a firewall rule to permit traffic between Databricks-managed VMs within your VPC
You can test the connectivity to the Databricks secrets API endpoint from within the GCE job cluster to ensure there are no network issues. Use tools like curl or telnet to check if the endpoint europe-west3.gcp.databricks.com:443 is reachable
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-23-2024 09:21 AM - edited 12-23-2024 09:25 AM
I guess you missed the point that it is working fine on GCE all purpose cluster. Means GCE all purpose cluster is able to access the databricks managed secret. Issue is with GCE job cluster, when it tries to access the secret. And firewall rule was automatically added when i updated the permission as this documentation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-23-2024 09:29 AM
Hi @Sadam97,
Thanks for your comments, can you run a connectivity test as I mentioned above? Is it failing intermittently or consistent?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-23-2024 09:46 AM
Hi @Alberto_Umana ,
Here is response of telnet command on all purpose GCE cluster,
Trying 34.159.208.230...
Connected to europe-west3.gcp.databricks.com.
Escape character is '^]'.
Connection closed by foreign host.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-23-2024 01:03 PM
Hi @Sadam97,
Thanks for checking in, would you please DIM me your workspaceID and clusterID and can dig deeper on our backend logs. Connection timeout could be due to several factors.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-24-2024 12:39 AM
Hi @Alberto_Umana ,
Here are the requested details,
cluster_id: 6223-152828-9jhx6lo7
workspace_id: 3976272202403488
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-24-2024 05:20 AM
Hi @Sadam97,
As I mentioned in my message, the failure happened because cluster failed to add containers, this could be due to different reasons, therefore asking you to raise a case with us to properly investigate this.

