<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Spark behavior while dealing with Actions &amp; Transformations ? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/spark-behavior-while-dealing-with-actions-transformations/m-p/25434#M17683</link>
    <description>&lt;P&gt;Hi,&amp;nbsp;&lt;/P&gt;&lt;P&gt;My question is - what happens to the initial RDD after the action is performed on it. Does it disappear or stays in the memory or does it needs to be explicitly cached() if we want to use it again.&lt;/P&gt;&lt;P&gt;For eg : If I execute this in a sequence :&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;df_output= df_input.filter(...)&amp;nbsp;-- &amp;gt; transformation_1&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;df_output.count() -- &amp;gt; Action_1&lt;/P&gt;&lt;P&gt;df_final = df_output.filter(...) --&amp;gt; Transformation_2&lt;/P&gt;&lt;P&gt;df_final.count()&amp;nbsp;-- &amp;gt; Action_2&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;While executing Action_2, does Transformation_1 &amp;amp; 2 are both performed again or only the Transformation_2 (if this is the case where is the result of Transformation_1 stored meanwhile) ?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 27 Oct 2022 22:20:05 GMT</pubDate>
    <dc:creator>Mradul07</dc:creator>
    <dc:date>2022-10-27T22:20:05Z</dc:date>
    <item>
      <title>Spark behavior while dealing with Actions &amp; Transformations ?</title>
      <link>https://community.databricks.com/t5/data-engineering/spark-behavior-while-dealing-with-actions-transformations/m-p/25434#M17683</link>
      <description>&lt;P&gt;Hi,&amp;nbsp;&lt;/P&gt;&lt;P&gt;My question is - what happens to the initial RDD after the action is performed on it. Does it disappear or stays in the memory or does it needs to be explicitly cached() if we want to use it again.&lt;/P&gt;&lt;P&gt;For eg : If I execute this in a sequence :&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;df_output= df_input.filter(...)&amp;nbsp;-- &amp;gt; transformation_1&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;df_output.count() -- &amp;gt; Action_1&lt;/P&gt;&lt;P&gt;df_final = df_output.filter(...) --&amp;gt; Transformation_2&lt;/P&gt;&lt;P&gt;df_final.count()&amp;nbsp;-- &amp;gt; Action_2&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;While executing Action_2, does Transformation_1 &amp;amp; 2 are both performed again or only the Transformation_2 (if this is the case where is the result of Transformation_1 stored meanwhile) ?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 27 Oct 2022 22:20:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/spark-behavior-while-dealing-with-actions-transformations/m-p/25434#M17683</guid>
      <dc:creator>Mradul07</dc:creator>
      <dc:date>2022-10-27T22:20:05Z</dc:date>
    </item>
  </channel>
</rss>

