cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

GKE Cluster Shows "Starting" Even After its turned on

Kayla
Valued Contributor II

Kayla_0-1749815522351.png

Curious if anyone else has run into this. After changing to GKE based clusters, they all turn on but don't show as turned on - we'll have it show as "Starting" but be able to see the same cluster in the dropdown that's already active. "Changing" to that one will let us run code.

1 REPLY 1

mark_ott
Databricks Employee
Databricks Employee

Yes, others have reported encountering this exact issue with Databricks clusters on Google Kubernetes Engine (GKE): after transitioning to GKE-based clusters, the UI may show clusters as "Starting" even though the cluster is already up and usable in practiceโ€”such as being visible in dropdowns and capable of running code after selecting it. This appears to be a recognized display or state synchronization problem rather than a fundamental failure of the cluster itself.โ€‹

What Users Are Reporting

  • Clusters stay in "Starting" or "Pending" states in the Databricks UI, but the compute is accessible and operational from dropdowns or notebook actions.โ€‹

  • Users are able to shift execution to the active cluster, which then allows them to run code, indicating the backend state isn't matching the UI state.โ€‹

Common Causes and Investigations

  • This may occur due to delays or errors in Databricks' backend synchronizing state information from GKE to the workspace UI.

  • Issues with network configuration, firewalls, or role permissions can sometimes contribute to similar symptoms, especially after architectural changes like moving to GKE.โ€‹

  • There have been reports across Databricks' forums about clusters being stuck in transition states for hours, sometimes resolved only by adjusting firewall rules or reviewing service account permissions.โ€‹

Workarounds and Notes

  • "Changing" to the same cluster in the dropdown or refreshing your workspace may allow job execution even if the UI says "Starting".โ€‹

  • If the problem persists across different sessions or users, checking cluster logs, Kubernetes events, or escalating to Databricks support is recommended, especially if recent changes to networking, permissions, or Databricks Runtime were made.โ€‹

Summary Table

Symptom Usable Workaround Known Causes Additional Steps
Cluster UI shows "Starting" Select active cluster; run code Backend sync lag Check network/perm issues
Cluster stuck in "Pending" Sometimes needs config tweak Firewall/permissions Review/adjust cloud settings
Still not working -- Various/cloud issues Contact Databricks support
 
 

This condition is not unique, and continued issues may warrant direct support engagement from Databricks.โ€‹

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now