Spark Driver failed due to DRIVER_UNAVAILABLE but not due to memory pressure
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-03-2024 06:09 PM
Hello,
I have a job cluster running streaming job and it unexpectedly failed on 19th March due to DRIVER_UNAVAILABLE (Request timed out, Driver is temporarily unavailable) in event log. This is the run: https://atlassian-discover.cloud.databricks.com/jobs/323849284041517/runs/395169892801478?o=44820012...
I'm aware of a thread reporting the same problem: https://kb.databricks.com/en_US/jobs/driver-unavailable and it pointed out memory pressure is a common cause. However, according to driver stdout there were only minor GCs that took around 30ms-40ms around the time the driver became unavailable:
I also checked the driver log (log4j logs) and it doesn't have any error messages, a few warning messages are unrelated. In fact the driver even continued outputting logs several minutes after the DRIVER_UNAVAILABLE error message appeared in event log.
I tried loading spark UI but after a long wait with messages saying processing files, it errors with the following message, so I can't see spark history UI as well:
Could anyone help please?
- Labels:
-
Spark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-04-2024 10:16 PM
Hi @duliu , Hope you are doing well!
Would you kindly see if the KB article below addresses your problem?
https://kb.databricks.com/en_US/jobs/driver-unavailable
Please let me know if this helps and leave a like if this information is useful, followups are appreciated.
Kudos
Ayushi

