<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How do I avoid the &amp;quot;No space left on device&amp;quot; error where my disk is running out of space? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-do-i-avoid-the-quot-no-space-left-on-device-quot-error-where/m-p/30755#M22328</link>
    <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Had to update this line&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;val spaceInGB = ("df /local_disk".!!).split(" +")(9).toInt / 1024 / 1024&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;to&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;val spaceInGB = ("df /local_disk0".!!).split(" +")(9).toInt / 1024 / 1024&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;In Databricks 7.3.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 30 Nov 2020 23:03:44 GMT</pubDate>
    <dc:creator>Capemo</dc:creator>
    <dc:date>2020-11-30T23:03:44Z</dc:date>
    <item>
      <title>How do I avoid the "No space left on device" error where my disk is running out of space?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-do-i-avoid-the-quot-no-space-left-on-device-quot-error-where/m-p/30746#M22319</link>
      <description />
      <pubDate>Tue, 24 Feb 2015 23:51:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-do-i-avoid-the-quot-no-space-left-on-device-quot-error-where/m-p/30746#M22319</guid>
      <dc:creator>cfregly</dc:creator>
      <dc:date>2015-02-24T23:51:39Z</dc:date>
    </item>
    <item>
      <title>Re: How do I avoid the "No space left on device" error where my disk is running out of space?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-do-i-avoid-the-quot-no-space-left-on-device-quot-error-where/m-p/30747#M22320</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;P&gt;This error indicates that the Worker's local disks are filling up.&lt;/P&gt;&lt;P&gt;&lt;B&gt;Context of the Error&lt;/B&gt;&lt;/P&gt;&lt;P&gt;A Worker's local disk is used by Spark for the following:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Intermediate shuffle files&lt;UL&gt;&lt;LI&gt;Contain the RDD's parent dependency data (lineage)&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;RDD persistence&lt;UL&gt;&lt;LI&gt;StorageLevel.MEMORY_AND_DISK&lt;/LI&gt;&lt;LI&gt;StorageLevel.DISK_ONLY&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;You can inspect the amount of local disk space before and after your shuffle as follows (Scala):&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;import scala.sys.process._
val perNodeSpaceInGB = sc.parallelize(0 to 100).map { _ =&amp;gt;
val hostname = ("hostname".!!).trim
val spaceInGB = ("df /local_disk".!!).split(" +")(9).toInt / 1024 / 1024
//System.gc()
(hostname, spaceInGB)
}.collect.distinct
println(f"There are ${perNodeSpaceInGB.size} nodes in this cluster. Per node free space (in GB):\n--------------------------------------")
perNodeSpaceInGB.foreach{case (a, b) =&amp;gt; println(f"$a\t\t$b%2.2f")}
val totalSpaceInGB = perNodeSpaceInGB.map(_._2).sum
println(f"---------------------------&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;B&gt;Causes of the Error&lt;/B&gt;&lt;/P&gt;&lt;P&gt;Intermediate shuffle files that contain an RDD's parent dependency data (lineage) hang around on the Workers in case the RDD needs to be recovered from its parents.&lt;/P&gt;&lt;P&gt;If the intermediate shuffle files are not removed quickly enough, they can cause the "No space left on device" error to occur on a Worker.&lt;/P&gt;&lt;P&gt;Here is an example that might lead to intermediate shuffle files not being cleaned up (Python):&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;# Define an RDD which creates some shuffles
myRdd = sc.textFile(...).groupByKey(...).map(...) 
myRdd.count()&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;When this is run, the local &lt;PRE&gt;&lt;CODE&gt;myRdd&lt;/CODE&gt;&lt;/PRE&gt; variable will prevent the removal of the intermediate shuffle files on the Workers.&lt;/P&gt;&lt;P&gt;Another, more subtle, example of a dangling RDD reference is this:  consider a notebook cell with a single&lt;PRE&gt;&lt;CODE&gt;unpersist&lt;/CODE&gt;&lt;/PRE&gt; call:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;myRdd.unpersist()&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;PRE&gt;&lt;CODE&gt;RDD.unpersist()&lt;/CODE&gt;&lt;/PRE&gt; returns a reference to the RDD being unpersisted. The last value in a notebook cell is automatically assigned to an &lt;PRE&gt;&lt;CODE&gt;Out[someNumber]&lt;/CODE&gt;&lt;/PRE&gt;variable in the Python interpreter.&lt;/P&gt;&lt;P&gt;This subtle variable can keep the RDD alive and prevent the removal of intermediate shuffle files.  This problem isn't specific to &lt;PRE&gt;&lt;CODE&gt;unpersist()&lt;/CODE&gt;&lt;/PRE&gt;, either: I think that any case where you have an RDD as the final element of a notebook cell may lead to a reference to the RDD that prevents the removal of intermediate shuffle files.&lt;/P&gt;&lt;P&gt;There might be a way to clear the &lt;PRE&gt;&lt;CODE&gt;Out&lt;/CODE&gt;&lt;/PRE&gt; variables to force them to be cleaned up, but I'm not sure offhand.&lt;/P&gt;&lt;P&gt;Consider using functions to limit the scope of RDD references.&lt;/P&gt;&lt;P&gt;&lt;B&gt;Workaround 1:  Explicitly Remove Intermediate Shuffle Files&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;/B&gt;This intermediate shuffle files on the Workers are removed from disk when the RDD is free'd and goes out of scope.&lt;/P&gt;&lt;P&gt;&lt;PRE&gt;&lt;CODE&gt;RDD.unpersist()&lt;/CODE&gt;&lt;/PRE&gt; is one way for the RDD to go out of scope. Also, you can explicitly re-assign the RDD variable to &lt;PRE&gt;&lt;CODE&gt;None&lt;/CODE&gt;&lt;/PRE&gt; or &lt;PRE&gt;&lt;CODE&gt;null&lt;/CODE&gt;&lt;/PRE&gt; when you're done using them.  &lt;/P&gt;&lt;P&gt;These mechanisms will flag the intermediate shuffle files for removal. (Note:  this may not be desirable if you need to keep the RDD around for later processing.)&lt;/P&gt;&lt;P&gt;Upon GC, the Spark &lt;A href="https://github.com/apache/spark/blob/v1.2.1/core/src/main/scala/org/apache/spark/ContextCleaner.scala" target="_blank"&gt;ContextCleaner&lt;/A&gt; will remove the flagged intermediate shuffle files on all Workers that contributed to the lineage of the RDD that was free'd.&lt;/P&gt;&lt;P&gt;&lt;I&gt;In other words, a GC - which is usually meant to free up memory - is also used by Spark to free up the intermediate shuffle files on Workers via the ContextCleaner.&lt;/I&gt;&lt;/P&gt;&lt;P&gt;If the GC required by the intermediate shuffle file cleaner process is not happening fast enough on its own, you can explicitly call &lt;PRE&gt;&lt;CODE&gt;System.gc()&lt;/CODE&gt;&lt;/PRE&gt; in Scala or &lt;PRE&gt;&lt;CODE&gt;sc._jvm.System.gc()&lt;/CODE&gt;&lt;/PRE&gt; in Python to nudge the JVM into a GC and ultimately remove the intermediate shuffle files.  While this technically isn't guaranteed to force a GC, it's proven effective for users in this situation.&lt;/P&gt;&lt;P&gt;&lt;B&gt;Workaround 2:  Use More Workers&lt;/B&gt;&lt;/P&gt;&lt;P&gt;Assuming even distribution of partitions, adding more Workers will - on average - reduce the disk space required for the intermediate shuffle files on each Worker.&lt;/P&gt;&lt;P&gt;&lt;B&gt;Workaround 3:  Checkpoint the RDD&lt;/B&gt;&lt;/P&gt;&lt;P&gt;Another solution, used by Spark Streaming in particular, is to periodically call &lt;PRE&gt;&lt;CODE&gt;RDD.checkpoint()&lt;/CODE&gt;&lt;/PRE&gt;. This saves the current immutable state of the RDD to S3, snips the RDD lineage, and allows the intermediate shuffle files to be removed.&lt;/P&gt;&lt;P&gt;This requires a prior call &lt;PRE&gt;&lt;CODE&gt;sc.setCheckpointDir()&lt;/CODE&gt;&lt;/PRE&gt; with something like &lt;PRE&gt;&lt;CODE&gt;/checkpoints&lt;/CODE&gt;&lt;/PRE&gt;. This will save the checkpoint data to DBFS/S3 in that location.&lt;/P&gt;&lt;P&gt;&lt;I&gt;This is the best of both worlds:  the RDD is still recoverable, but the intermediate shuffle files can be removed from the Workers.&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Workaround 4: [&lt;/B&gt;&lt;B&gt;Spark SQL Only] &lt;/B&gt;&lt;B&gt;Increase Shuffle Partitions &lt;/B&gt;&lt;/P&gt;&lt;P&gt;If you're seeing this with Spark SQL HiveQL commands, you can try increasing the number of Spark SQL shuffle partitions as follows:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;SET spark.sql.shuffle.partitions=400; &lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 25 Feb 2015 00:01:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-do-i-avoid-the-quot-no-space-left-on-device-quot-error-where/m-p/30747#M22320</guid>
      <dc:creator>cfregly</dc:creator>
      <dc:date>2015-02-25T00:01:17Z</dc:date>
    </item>
    <item>
      <title>Re: How do I avoid the "No space left on device" error where my disk is running out of space?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-do-i-avoid-the-quot-no-space-left-on-device-quot-error-where/m-p/30748#M22321</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;@cfregly&lt;/P&gt;
&lt;P&gt;Great post!. I am getting the above error with spark-sql hiveql commands. Can you please explain how increasing "spark.sql.shuffle.partitions" property helps? What else can be done to avoid the space issue in spark sql?&lt;/P&gt;
&lt;P&gt;Thank You.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 25 Apr 2016 07:26:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-do-i-avoid-the-quot-no-space-left-on-device-quot-error-where/m-p/30748#M22321</guid>
      <dc:creator>PrinceBhatti</dc:creator>
      <dc:date>2016-04-25T07:26:45Z</dc:date>
    </item>
    <item>
      <title>Re: How do I avoid the "No space left on device" error where my disk is running out of space?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-do-i-avoid-the-quot-no-space-left-on-device-quot-error-where/m-p/30749#M22322</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;It is helpful. So do you mean the suggested way in Python is like the following?&lt;/P&gt;
&lt;P&gt;t = myRdd.unpersist()&lt;/P&gt;
&lt;P&gt;t = None&lt;/P&gt;
&lt;P&gt;Besides, could you point me any doc about Python Out variable, I am new in Python and did not find that. &lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 26 Nov 2016 07:40:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-do-i-avoid-the-quot-no-space-left-on-device-quot-error-where/m-p/30749#M22322</guid>
      <dc:creator>Wanglei</dc:creator>
      <dc:date>2016-11-26T07:40:31Z</dc:date>
    </item>
    <item>
      <title>Re: How do I avoid the "No space left on device" error where my disk is running out of space?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-do-i-avoid-the-quot-no-space-left-on-device-quot-error-where/m-p/30750#M22323</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Yeah, not sure how that really helps in this case. Any explanation?&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 26 Jul 2017 07:31:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-do-i-avoid-the-quot-no-space-left-on-device-quot-error-where/m-p/30750#M22323</guid>
      <dc:creator>ManojMalicious</dc:creator>
      <dc:date>2017-07-26T07:31:51Z</dc:date>
    </item>
    <item>
      <title>Re: How do I avoid the "No space left on device" error where my disk is running out of space?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-do-i-avoid-the-quot-no-space-left-on-device-quot-error-where/m-p/30751#M22324</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;The 1st Workaround&lt;B&gt; &lt;/B&gt;(Explicitly Remove Intermediate Shuffle Files) worked for me. Thank you.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 08 Aug 2017 15:04:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-do-i-avoid-the-quot-no-space-left-on-device-quot-error-where/m-p/30751#M22324</guid>
      <dc:creator>co_dragos</dc:creator>
      <dc:date>2017-08-08T15:04:58Z</dc:date>
    </item>
    <item>
      <title>Re: How do I avoid the "No space left on device" error where my disk is running out of space?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-do-i-avoid-the-quot-no-space-left-on-device-quot-error-where/m-p/30752#M22325</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;This is a generic problem.&lt;/P&gt;
&lt;P&gt;Cheap solution is to increase number of shuffle partitions (in case loads are skewed) or restart the cluster.&lt;/P&gt;
&lt;P&gt;Safe solution is to increase cluster size or node sizes (SSD, RAM,…)&lt;/P&gt;
&lt;P&gt;Eventually, you have to make sure that you have efficient codes. You read and write (do not keep things in memory, but instead process like a streaming pipeline from source to sink). Things like repartition can break this efficiency.&lt;/P&gt;
&lt;P&gt;Also make sure that you are not overwriting a cached variable. For example below code is wrong:&lt;/P&gt;
&lt;P&gt;df=…cache()&lt;/P&gt;
&lt;P&gt;df=df.withColumn(…..).cache()&lt;/P&gt;
&lt;P&gt;Instead put an unpersist between both lines. Otherwise there is an orphan reference to a cached data.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 07 Sep 2017 08:48:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-do-i-avoid-the-quot-no-space-left-on-device-quot-error-where/m-p/30752#M22325</guid>
      <dc:creator>ReKa</dc:creator>
      <dc:date>2017-09-07T08:48:41Z</dc:date>
    </item>
    <item>
      <title>Re: How do I avoid the "No space left on device" error where my disk is running out of space?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-do-i-avoid-the-quot-no-space-left-on-device-quot-error-where/m-p/30753#M22326</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;any recommendation on how to work around this issue when using Spark SQL only?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;we already SET spark.sql.shuffle.partitions=XXX; to a couple of different values but it still keeps failing. Also, the cluster size / number of workers should be more than sufficient
&lt;P&gt;could the enable_elastic_disc setting on the cluster (https://docs.azuredatabricks.net/api/latest/clusters.html) help on this&lt;/P&gt;
&lt;P&gt;regards,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;-gerhard 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 11 Jun 2019 07:27:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-do-i-avoid-the-quot-no-space-left-on-device-quot-error-where/m-p/30753#M22326</guid>
      <dc:creator>gbrueckl</dc:creator>
      <dc:date>2019-06-11T07:27:06Z</dc:date>
    </item>
    <item>
      <title>Re: How do I avoid the "No space left on device" error where my disk is running out of space?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-do-i-avoid-the-quot-no-space-left-on-device-quot-error-where/m-p/30754#M22327</link>
      <description>&lt;P&gt;I have 8 GB of internal memory, but several MB of them are free but I also have an additional memory with an 8 GB memory card. Anyway, there is no enough space and the memory card is completely empty.&lt;/P&gt;&lt;P&gt;essay service&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 17 Jul 2019 11:05:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-do-i-avoid-the-quot-no-space-left-on-device-quot-error-where/m-p/30754#M22327</guid>
      <dc:creator>MichaelHuntsber</dc:creator>
      <dc:date>2019-07-17T11:05:51Z</dc:date>
    </item>
    <item>
      <title>Re: How do I avoid the "No space left on device" error where my disk is running out of space?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-do-i-avoid-the-quot-no-space-left-on-device-quot-error-where/m-p/30755#M22328</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Had to update this line&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;val spaceInGB = ("df /local_disk".!!).split(" +")(9).toInt / 1024 / 1024&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;to&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;val spaceInGB = ("df /local_disk0".!!).split(" +")(9).toInt / 1024 / 1024&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;In Databricks 7.3.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 30 Nov 2020 23:03:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-do-i-avoid-the-quot-no-space-left-on-device-quot-error-where/m-p/30755#M22328</guid>
      <dc:creator>Capemo</dc:creator>
      <dc:date>2020-11-30T23:03:44Z</dc:date>
    </item>
  </channel>
</rss>

