Where the "Driver logs" are stored by default, and How much default space to store it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-19-2024 01:23 AM
Recently, when I was using databricks for deep learning I ran into an issue, i.e., after a certain amount of time of execution the cluster would break and restart. The logs are as below:
- The Event logs displayed "Metastore is down; Driver is up but is not responsive, likely due to GC.; Spark exception received from driver. Driver down cause: driver state change (exit ...)" .
- Driver logs displayed "echo: write error: no space left on device".
Specifically, my program prints a lot of content, and considering that all prints are logged in Driver logs, I suspect that the cluster breaks because of an OOM in the driver logs. So, I would like to know:
- where is the default storage location for driver logs and what is the size?
- Is the "Driver is up but is not responsive, likely due to GC" issue due to driver logs memory limitation? This link seems to give an explanation, not sure if it's correct or not. Job cluster limits on notebook output - Databricks
- If this problem comes from driver logs, can it be solved by modifying `create cluster`→`Advanced Options`→`Logging`→`Destination`?
Looking forward to getting a reply from the experts, thank you very much!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-19-2024 10:57 AM
The error message "echo: write error: no space left on device" indicates that the storage space for the driver logs might be full.
The default storage location for driver logs in Databricks is on the local disk of the driver node. However, the exact size limit can vary depending on the specific configuration of your Databricks environment and the type of cloud storage you're using.
The issue "Driver is up but is not responsive, likely due to GC" could indeed be due to memory limitations. Garbage Collection (GC) pauses can make the driver unresponsive if the system is trying to free up memory space. The link you provided does give an explanation related to job output limits, which might be related if your program is generating a large amount of output that is being logged.
Modifying the create cluster
→Advanced Options
→Logging
→Destination
to change the storage location for logs could potentially help solve this problem. You could consider directing the logs to a location with more available storage space.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-22-2024 01:11 AM
Thanks for your reply. However, although I modify the create cluster→Advanced Options→Logging→Destination to a destination, the "echo: write error: no space left on device" still appears. I change the destination to "/dbfs/FileStore", where the space is big enough. Can you help me? (Very distressed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-29-2024 09:12 AM
Hello Jaron, is it not possible for you to redirect the login to an ABFSS or S3 bucket?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-29-2024 07:07 PM
Hi, Walter_C, I have tried to redirect the .log file to other destination. However, I found that redirection through create cluster→Advanced Options→Logging→Destination is a copy rather than a move. This means that the driver log will still increase. (The Spark UI of databricks is useless and cannot display any valid information, including memory usage of drivers and executors.
Finally, I reluctantly switched to a larger driver to solve this problem.

