cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Clusters Suddenly Failing - java.lang.RuntimeException: abort: DriverClient destroyed

Kayla
Contributor

I'm having clusters randomly failing that we've been using without issue for weeks. We're able to run a handful of cells and then get an error about "java.lang.RuntimeException: abort: DriverClient destroyed". 
Has anyone run into this before?

Edit: I was able to trigger this error running nothing but "%sql select 1".


Cluster config and full error below:

 

 

 

{
    "cluster_id": "0815-110442-i3mwexgy",
    "creator_user_name": "REDACTED",
    "driver": {
        "private_ip": "127.0.0.1"
    },
    "spark_context_id": 2012540609986447000,
    "cluster_name": "Single Node Dev",
    "spark_version": "13.3.x-scala2.12",
    "spark_conf": {
        "spark.master": "local[*, 4]",
        "spark.databricks.cluster.profile": "singleNode"
    },
    "gcp_attributes": {
        "use_preemptible_executors": false,
        "google_service_account": "REDACTED",
        "availability": "ON_DEMAND_GCP",
        "zone_id": "auto"
    },
    "node_type_id": "n2-standard-4",
    "driver_node_type_id": "n2-standard-4",
    "custom_tags": {
        "ResourceClass": "SingleNode",
        "cluster_purpose": "ad_hoc"
    },
    "spark_env_vars": {
        "DEFAULT_DB": "dev_hlm"
    },
    "autotermination_minutes": 120,
    "enable_elastic_disk": false,
    "disk_spec": {},
    "cluster_source": "UI",
    "single_user_name": "REDACTED",
    "enable_local_disk_encryption": false,
    "instance_source": {
        "node_type_id": "n2-standard-4"
    },
    "driver_instance_source": {
        "node_type_id": "n2-standard-4"
    },
    "data_security_mode": "SINGLE_USER",
    "runtime_engine": "PHOTON",
    "effective_spark_version": "13.3.x-photon-scala2.12",
    "enable_serverless_compute": false,
    "state": "RUNNING",
    "start_time": 1700230606212,
    "last_state_loss_time": 1700230917459,
    "num_workers": 0,
    "cluster_memory_mb": 16384,
    "cluster_cores": 4,
    "default_tags": {
        "Vendor": "Databricks",
        "Creator": "REDACTED",
        "ClusterName": "SingleNodeDev",
        "ClusterId": "0815-110442-i3mwexgy"
    },
    "pinned_by_user_name": "107730441514877",
    "init_scripts_safe_mode": false,
    "spec": {
        "cluster_name": "Single Node Dev",
        "spark_version": "13.3.x-scala2.12",
        "spark_conf": {
            "spark.master": "local[*, 4]",
            "spark.databricks.cluster.profile": "singleNode"
        },
        "gcp_attributes": {
            "use_preemptible_executors": false,
            "google_service_account": "REDACTED",
            "availability": "ON_DEMAND_GCP",
            "zone_id": "auto"
        },
        "node_type_id": "n2-standard-4",
        "driver_node_type_id": "n2-standard-4",
        "custom_tags": {
            "ResourceClass": "SingleNode",
            "cluster_purpose": "ad_hoc"
        },
        "spark_env_vars": {
            "DEFAULT_DB": "dev_hlm"
        },
        "autotermination_minutes": 120,
        "enable_elastic_disk": false,
        "single_user_name": "REDACTED",
        "enable_local_disk_encryption": false,
        "data_security_mode": "SINGLE_USER",
        "runtime_engine": "PHOTON",
        "num_workers": 0
    }
}

 

 

 



 

 

 

Internal error. Attach your notebook to a different compute or restart the current compute.
java.lang.RuntimeException: abort: DriverClient destroyed
	at com.databricks.backend.daemon.driver.DriverClient.$anonfun$poll$3(DriverClient.scala:577)
	at scala.concurrent.Future.$anonfun$flatMap$1(Future.scala:307)
	at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:54)
	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:77)
	at com.databricks.threading.DatabricksExecutionContext$InstrumentedRunnable.run(DatabricksExecutionContext.scala:36)
	at com.databricks.threading.NamedExecutor$$anon$2.$anonfun$run$2(NamedExecutor.scala:366)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:420)
	at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:418)
	at com.databricks.threading.NamedExecutor.withAttributionContext(NamedExecutor.scala:285)
	at com.databricks.threading.NamedExecutor$$anon$2.$anonfun$run$1(NamedExecutor.scala:364)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.context.integrity.IntegrityCheckContext$ThreadLocalStorage$.withValue(IntegrityCheckContext.scala:44)
	at com.databricks.threading.NamedExecutor$$anon$2.run(NamedExecutor.scala:356)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)

 

 

 

 

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @Kayla , Let’s explore some potential solutions to address this issue:

  1. Cluster Configuration:

  2. Memory-Intensive Operations:

  3. Metastore Corruption:

  4. Check for Shared Queries:


@Kaniz wrote:
  • You mentioned that the same code worked before with a smaller 6-node cluster but started failing after upgrading to a 12-node cluster. 

I made no such claim. This was a single node cluster, failing on extremely basic operations. 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.