Databricks Community

PKD28 · ‎09-05-2024

Jobs within the all purpose DB Cluster are failing with "the spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically reattached"

In the event log it says "Event_type=DRIVER_NOT_RESPONDING & MESSAGE= "Driver is up but is not responsive, likely due to GC."

Please help me to fix this.

szymon_dybczak · ‎09-05-2024

Hi @PKD28 ,

One common cause for this error is that the driver is undergoing a memory bottleneck. When this happens, the driver crashes with an out of memory (OOM) condition and gets restarted or becomes unresponsive due to frequent full garbage collection. So, 9/10 times GC is due to out of memory exceptions. What you can try to do is to increase drivers memory first and see if that helps.

View solution in original post

szymon_dybczak · ‎09-05-2024

I would try to use driver with higher amount of memory, just to check if it will handle the load. So maybe I'll try to run a process on Standard_E20d_v4 or Standard_E32d_v4 (this one has 2x more RAM memory, so it should work)

View solution in original post

szymon_dybczak · ‎09-05-2024

Hi @PKD28 ,

One common cause for this error is that the driver is undergoing a memory bottleneck. When this happens, the driver crashes with an out of memory (OOM) condition and gets restarted or becomes unresponsive due to frequent full garbage collection. So, 9/10 times GC is due to out of memory exceptions. What you can try to do is to increase drivers memory first and see if that helps.

PKD28 · ‎09-05-2024

just now there is one cluster issue

cluster error: Driver is unresponsive likely due to GC

cluster conf:

worker: Standard_D8ads_v5

Driver: standard_E16d_v4

What do you suggest here ??

szymon_dybczak · ‎09-05-2024

I would try to use driver with higher amount of memory, just to check if it will handle the load. So maybe I'll try to run a process on Standard_E20d_v4 or Standard_E32d_v4 (this one has 2x more RAM memory, so it should work)

Databricks Community

Databricks Cluster job failure issue

Connect with Databricks Users in Your Area

Databricks Learning Festival (Virtual): 15 January - 31 January 2025

Share Your Feedback in Our Community Survey

Databricks Named a Leader in the 2024 Gartner® Magic Quadrant™ for Cloud Database Management Systems

Milestone: DatabricksTV Reaches 100 Videos!

Announcing the new Meta Llama 3.3 model on Databricks