<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: When to persist and when to unpersist RDD in Spark in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/when-to-persist-and-when-to-unpersist-rdd-in-spark/m-p/29995#M21682</link>
    <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;This doesn't answer any of the questions asked. This question is about &lt;B&gt;unpersisting&lt;/B&gt; a data frame. The linked docs only say that it can be done, but doesn't give any hints as to &lt;B&gt;when it should be done&lt;/B&gt;. My worry is that unpersisting too soon will lead to zero cache benefits.&lt;/P&gt;
&lt;P&gt;I assume that you should wait until the after last force evaluation, but it's not documented and it's hard to reason about given that cache/unpersist are mutating.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 04 Nov 2019 19:14:24 GMT</pubDate>
    <dc:creator>TimKellogg</dc:creator>
    <dc:date>2019-11-04T19:14:24Z</dc:date>
    <item>
      <title>When to persist and when to unpersist RDD in Spark</title>
      <link>https://community.databricks.com/t5/data-engineering/when-to-persist-and-when-to-unpersist-rdd-in-spark/m-p/29993#M21680</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Lets say i have the following:&lt;/P&gt;&amp;lt;code&amp;gt;val dataset2 = dataset1.persist(StorageLevel.MEMORY_AND_DISK)
&lt;P&gt;&lt;/P&gt; 
&lt;P&gt;val dataset3 = dataset2.map(.....)1) &lt;/P&gt;
&lt;P&gt;1)If you do a transformation on the dataset2 then you have to persist it and pass it to dataset3 and unpersist the previous or not?&lt;/P&gt;
&lt;P&gt;2)I am trying to figure out when to persist and unpersist RDDs. With every new rdd that is created do i have to persist it?&lt;/P&gt;
&lt;P&gt;3)In order for an unpersist to take place, an action must be following?(e.x otherrdd.count)&lt;/P&gt;
&lt;P&gt;Thanks&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 22 Nov 2015 21:03:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/when-to-persist-and-when-to-unpersist-rdd-in-spark/m-p/29993#M21680</guid>
      <dc:creator>paourissi</dc:creator>
      <dc:date>2015-11-22T21:03:31Z</dc:date>
    </item>
    <item>
      <title>Re: When to persist and when to unpersist RDD in Spark</title>
      <link>https://community.databricks.com/t5/data-engineering/when-to-persist-and-when-to-unpersist-rdd-in-spark/m-p/29994#M21681</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;It is well documented here : &lt;A href="http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence" target="test_blank"&gt;http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 25 Nov 2015 06:10:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/when-to-persist-and-when-to-unpersist-rdd-in-spark/m-p/29994#M21681</guid>
      <dc:creator>Arun_KumarPT</dc:creator>
      <dc:date>2015-11-25T06:10:50Z</dc:date>
    </item>
    <item>
      <title>Re: When to persist and when to unpersist RDD in Spark</title>
      <link>https://community.databricks.com/t5/data-engineering/when-to-persist-and-when-to-unpersist-rdd-in-spark/m-p/29995#M21682</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;This doesn't answer any of the questions asked. This question is about &lt;B&gt;unpersisting&lt;/B&gt; a data frame. The linked docs only say that it can be done, but doesn't give any hints as to &lt;B&gt;when it should be done&lt;/B&gt;. My worry is that unpersisting too soon will lead to zero cache benefits.&lt;/P&gt;
&lt;P&gt;I assume that you should wait until the after last force evaluation, but it's not documented and it's hard to reason about given that cache/unpersist are mutating.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 04 Nov 2019 19:14:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/when-to-persist-and-when-to-unpersist-rdd-in-spark/m-p/29995#M21682</guid>
      <dc:creator>TimKellogg</dc:creator>
      <dc:date>2019-11-04T19:14:24Z</dc:date>
    </item>
  </channel>
</rss>

