cancel
Showing results for 
Search instead for 
Did you mean: 
Community Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

Where the "Driver logs" are stored by default, and How much default space to store it.

Jaron
New Contributor II

Recently, when I was using databricks for deep learning I ran into an issue, i.e., after a certain amount of time of execution the cluster would break and restart. The logs are as below:

  • The Event logs displayed "Metastore is down; Driver is up but is not responsive, likely due to GC.; Spark exception received from driver. Driver down cause: driver state change (exit ...)" .
  • Driver logs displayed "echo: write error: no space left on device". 

Specifically, my program prints a lot of content, and considering that all prints are logged in Driver logs, I suspect that the cluster breaks because of an OOM in the driver logs. So, I would like to know:

  1. where is the default storage location for driver logs and what is the size?
  2. Is the "Driver is up but is not responsive, likely due to GC" issue due to driver logs memory limitation? This link seems to give an explanation, not sure if it's correct or not. Job cluster limits on notebook output - Databricks 
  3. If this problem comes from driver logs, can it be solved by modifying `create cluster`→`Advanced Options`→`Logging`→`Destination`?

Looking forward to getting a reply from the experts, thank you very much!

 

2 REPLIES 2

Walter_C
Valued Contributor III
Valued Contributor III

The error message "echo: write error: no space left on device" indicates that the storage space for the driver logs might be full.

The default storage location for driver logs in Databricks is on the local disk of the driver node. However, the exact size limit can vary depending on the specific configuration of your Databricks environment and the type of cloud storage you're using.

The issue "Driver is up but is not responsive, likely due to GC" could indeed be due to memory limitations. Garbage Collection (GC) pauses can make the driver unresponsive if the system is trying to free up memory space. The link you provided does give an explanation related to job output limits, which might be related if your program is generating a large amount of output that is being logged.

Modifying the create clusterAdvanced OptionsLoggingDestination to change the storage location for logs could potentially help solve this problem. You could consider directing the logs to a location with more available storage space.

 

Jaron
New Contributor II

Thanks for your reply. However, although I modify the create clusterAdvanced OptionsLoggingDestination to a destination, the "echo: write error: no space left on device" still appears. I change the destination to "/dbfs/FileStore", where the space is big enough. Can you help me? (Very distressed

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!