Hello Databricks Community,
It looks like your cluster is being terminated due to a lost connection with the driver node, which could be caused by network instability or malfunctioning instances. The second error message suggests that the cluster is being terminated while scaling up, possibly due to resource allocation issues.
Here are a few things you can check:
- Cluster Logs โ Review the logs in Databricks to see if there are more specific error messages.
- Cloud Provider Limits โ Ensure that your cloud provider is not enforcing limits on the number of instances you can allocate.
- Networking Issues โ Check your VPC settings, security groups, and firewall rules to ensure there are no restrictions on communication between nodes.
- Instance Availability โ Sometimes, cloud providers have shortages of specific instance types, which can cause scaling issues. Try using different instance types.
- Databricks Support โ If the issue persists, consider reaching out to Databricks support with your cluster ID and logs for further investigation.
Let me know if you need more help troubleshooting...Kindly take this thread serious!