04-17-2024 11:48 PM
Hi all!
Recently we've been getting lots of these errors when running Databricks notebooks:
At that time we observed DRIVER_NOT_RESPONDING (Driver is up but is not responsive, likely due to GC.) log on the single-user cluster we use.
Previously when this error appeared in cluster logs it was due to 2 things:
- The number of notebooks attached to that cluster was above the 145 limit, or
- The cluster driver memory was exhausted.
Lately, it seems that neither of these two things happens but our notebooks still fail.
Do you have any idea what might be the problem here?
04-18-2024 02:44 PM
are you able to get the full error stack trace from the driver's logs?
04-19-2024 01:30 AM
Unfortunately not for this event, but I will investigate these logs carefully next time when this error happens. Thnx!
04-19-2024 04:12 AM
You may also try to run the failing notebook on the job cluster
06-06-2024 11:51 PM - edited 06-06-2024 11:52 PM
In case somebody else runs into the same issue: After investigation from Databricks support the conclusion was that the driver's memory was overloaded ('Driver Not Responding' error message in the event log) but it can happen that we don't get the complete metric and Spark UI details when this happens. This is probably why I didn't see in the metrics tab that driver memory was exhausted. The suggested solution can be found in the following link https://kb.databricks.com/en_US/jobs/driver-unavailable
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group