cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Databricks Cluster job failure issue

PKD28
New Contributor II

Jobs within the all purpose DB Cluster are failing with "the spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically reattached"

In the event log it says "Event_type=DRIVER_NOT_RESPONDING & MESSAGE= "Driver is up but is not responsive, likely due to GC."

 

Please help me to fix this.

2 ACCEPTED SOLUTIONS

Accepted Solutions

szymon_dybczak
Contributor

Hi @PKD28 ,

One common cause for this error is that the driver is undergoing a memory bottleneck. When this happens, the driver crashes with an out of memory (OOM) condition and gets restarted or becomes unresponsive due to frequent full garbage collection. So, 9/10 times GC is due to out of memory exceptions. What you can try to do is to increase drivers memory first and see if that helps.

View solution in original post

I would try to use driver with higher amount of memory, just to check if it will handle the load. So maybe I'll try to run a process on Standard_E20d_v4 or Standard_E32d_v4 (this one has 2x more RAM memory, so it should work)

View solution in original post

3 REPLIES 3

szymon_dybczak
Contributor

Hi @PKD28 ,

One common cause for this error is that the driver is undergoing a memory bottleneck. When this happens, the driver crashes with an out of memory (OOM) condition and gets restarted or becomes unresponsive due to frequent full garbage collection. So, 9/10 times GC is due to out of memory exceptions. What you can try to do is to increase drivers memory first and see if that helps.

PKD28
New Contributor II

 

just now there is one cluster issue

cluster error: Driver is unresponsive likely due to GC

cluster conf:

worker: Standard_D8ads_v5

Driver: standard_E16d_v4

What do you suggest here ??

I would try to use driver with higher amount of memory, just to check if it will handle the load. So maybe I'll try to run a process on Standard_E20d_v4 or Standard_E32d_v4 (this one has 2x more RAM memory, so it should work)

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group