cancel
Showing results for 
Search instead for 
Did you mean: 
Community Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

Where the "Driver logs" are stored by default, and How much default space to store it.

Jaron
New Contributor III

Recently, when I was using databricks for deep learning I ran into an issue, i.e., after a certain amount of time of execution the cluster would break and restart. The logs are as below:

  • The Event logs displayed "Metastore is down; Driver is up but is not responsive, likely due to GC.; Spark exception received from driver. Driver down cause: driver state change (exit ...)" .
  • Driver logs displayed "echo: write error: no space left on device". 

Specifically, my program prints a lot of content, and considering that all prints are logged in Driver logs, I suspect that the cluster breaks because of an OOM in the driver logs. So, I would like to know:

  1. where is the default storage location for driver logs and what is the size?
  2. Is the "Driver is up but is not responsive, likely due to GC" issue due to driver logs memory limitation? This link seems to give an explanation, not sure if it's correct or not. Job cluster limits on notebook output - Databricks 
  3. If this problem comes from driver logs, can it be solved by modifying `create cluster`→`Advanced Options`→`Logging`→`Destination`?

Looking forward to getting a reply from the experts, thank you very much!

 

4 REPLIES 4

Walter_C
Honored Contributor
Honored Contributor

The error message "echo: write error: no space left on device" indicates that the storage space for the driver logs might be full.

The default storage location for driver logs in Databricks is on the local disk of the driver node. However, the exact size limit can vary depending on the specific configuration of your Databricks environment and the type of cloud storage you're using.

The issue "Driver is up but is not responsive, likely due to GC" could indeed be due to memory limitations. Garbage Collection (GC) pauses can make the driver unresponsive if the system is trying to free up memory space. The link you provided does give an explanation related to job output limits, which might be related if your program is generating a large amount of output that is being logged.

Modifying the create clusterAdvanced OptionsLoggingDestination to change the storage location for logs could potentially help solve this problem. You could consider directing the logs to a location with more available storage space.

 

Jaron
New Contributor III

Thanks for your reply. However, although I modify the create clusterAdvanced OptionsLoggingDestination to a destination, the "echo: write error: no space left on device" still appears. I change the destination to "/dbfs/FileStore", where the space is big enough. Can you help me? (Very distressed

Walter_C
Honored Contributor
Honored Contributor

Hello Jaron, is it not possible for you to redirect the login to an ABFSS or S3 bucket?

Jaron
New Contributor III

Hi, Walter_C, I have tried to redirect the .log file to other destination. However, I found that redirection through create clusterAdvanced OptionsLoggingDestination is a copy rather than a move. This means that the driver log will still increase. (The Spark UI of databricks is useless and cannot display any valid information, including memory usage of drivers and executors.

Finally, I reluctantly switched to a larger driver to solve this problem.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!