cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Cluster is not starting

Pavan578
New Contributor II

Cluster 'xxxxxxx' was terminated. Reason: WORKER_SETUP_FAILURE (SERVICE_FAULT). Parameters: databricks_error_message:DBFS Daemomn is not reachable., gcp_error_message:Unable to reach the colocated DBFS Daemon.

Can Anyone help me how can we resolve this issue.

2 REPLIES 2

agallard
Contributor

Hi @Pavan578 

To resolve the WORKER_SETUP_FAILURE (SERVICE_FAULT) error, along with the message DBFS Daemon is not reachable on a Databricks cluster running in Google Cloud Platform (GCP), here are some steps you can follow:

  1. Check Databricks Service Status:

    • First, check both the Google Cloud Console and the Databricks status portal to see if there are any ongoing issues with Databricks services or resources that could be affecting DBFS connectivity.
  2. Review Cluster Configuration:

    • Make sure the cluster is configured correctly to access Google Cloud Storage and DBFS. This includes verifying IAM permissions and access credentials for both Databricks and DBFS.
  3. Adjust Network Settings and Firewalls:

    • In some cases, DBFS access issues are due to network permission settings or firewall configurations on GCP. Ensure the workers can reach DBFS and that essential ports (e.g., port 443 for HTTPS) are open.
  4. Update Cluster Image:

    • Try updating the cluster image, as recent releases may contain fixes for connectivity and DBFS configuration issues.
  5. Examine Cluster Logs:

    • Check the event logs in Databricks for additional error messages on the workers that might give more context about the connectivity failure.
  6. Resource Scaling and Internal Connectivity:

    • Make sure the cluster has enough resources allocated, and consider scaling up or out if the workers are overloaded, as this can sometimes prevent proper DBFS connectivity​​.

Here are some steps I can think of.

Try it out and let us know! 😉

Alfonso Gallardo
-------------------
 I love working with tools like Databricks, Python, Azure, Microsoft Fabric, Azure Data Factory, and other Microsoft solutions, focusing on developing scalable and efficient solutions with Apache Spark

Pavan578
New Contributor II

Thanks @agallard . I will check the above steps and let you know.

 

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now