<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How does Spark do lazy evaluation? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-does-spark-do-lazy-evaluation/m-p/35125#M25798</link>
    <description>&lt;P&gt;For context, I am running Spark on databricks platform and using Delta Tables (s3). &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Let's assume we a table called &lt;B&gt;&lt;I&gt;table_one. &lt;/I&gt;&lt;/B&gt;I create a view called &lt;I&gt;view_one&lt;/I&gt; using the table and then call &lt;I&gt;view_one. &lt;/I&gt;Next&lt;I&gt;,&lt;/I&gt; I create another view, called &lt;I&gt;view_two &lt;/I&gt;based on view_one and then call view_two. Will all the calculations be done again for &lt;I&gt;view_one. &lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Example commands are below i.e. when cmd4 is called, will cmd1 be re-executed to calculate cmd4?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Cmd1:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;CREATE OR REPLACE VIEW_ONE FROM 
SELECT 
     ....
FROM
    table_one
WHERE
   .....&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Cmd2:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;SELECT * FROM VIEW_ONE; &lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Cmd3:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;CREATE OR REPLACE VIEW VIEW_TWO AS
SELECT 
    ....
FROM 
  VIEW_ONE
WHERE
 .....;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Cmd4:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;SELECT * FROM VIEW_TWO; &lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Sat, 13 Nov 2021 17:40:42 GMT</pubDate>
    <dc:creator>Constantine</dc:creator>
    <dc:date>2021-11-13T17:40:42Z</dc:date>
    <item>
      <title>How does Spark do lazy evaluation?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-does-spark-do-lazy-evaluation/m-p/35125#M25798</link>
      <description>&lt;P&gt;For context, I am running Spark on databricks platform and using Delta Tables (s3). &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Let's assume we a table called &lt;B&gt;&lt;I&gt;table_one. &lt;/I&gt;&lt;/B&gt;I create a view called &lt;I&gt;view_one&lt;/I&gt; using the table and then call &lt;I&gt;view_one. &lt;/I&gt;Next&lt;I&gt;,&lt;/I&gt; I create another view, called &lt;I&gt;view_two &lt;/I&gt;based on view_one and then call view_two. Will all the calculations be done again for &lt;I&gt;view_one. &lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Example commands are below i.e. when cmd4 is called, will cmd1 be re-executed to calculate cmd4?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Cmd1:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;CREATE OR REPLACE VIEW_ONE FROM 
SELECT 
     ....
FROM
    table_one
WHERE
   .....&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Cmd2:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;SELECT * FROM VIEW_ONE; &lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Cmd3:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;CREATE OR REPLACE VIEW VIEW_TWO AS
SELECT 
    ....
FROM 
  VIEW_ONE
WHERE
 .....;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Cmd4:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;SELECT * FROM VIEW_TWO; &lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 13 Nov 2021 17:40:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-does-spark-do-lazy-evaluation/m-p/35125#M25798</guid>
      <dc:creator>Constantine</dc:creator>
      <dc:date>2021-11-13T17:40:42Z</dc:date>
    </item>
    <item>
      <title>Re: How does Spark do lazy evaluation?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-does-spark-do-lazy-evaluation/m-p/35126#M25799</link>
      <description>&lt;P&gt;Hello @John Constantine​! My name is Piper and I'm a community moderator for Databricks. Welcome to the community and thank you for your question! Let's give it a while to see what other members have to say. &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; &lt;/P&gt;</description>
      <pubDate>Sat, 13 Nov 2021 20:14:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-does-spark-do-lazy-evaluation/m-p/35126#M25799</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2021-11-13T20:14:08Z</dc:date>
    </item>
    <item>
      <title>Re: How does Spark do lazy evaluation?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-does-spark-do-lazy-evaluation/m-p/35127#M25800</link>
      <description>&lt;P&gt;short answer: yes.  Spark will run view_one twice.&lt;/P&gt;&lt;P&gt;Unless you cache it (by using delta cache or persist()/cache()).&lt;/P&gt;</description>
      <pubDate>Sun, 14 Nov 2021 06:57:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-does-spark-do-lazy-evaluation/m-p/35127#M25800</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2021-11-14T06:57:26Z</dc:date>
    </item>
    <item>
      <title>Re: How does Spark do lazy evaluation?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-does-spark-do-lazy-evaluation/m-p/35128#M25801</link>
      <description>&lt;P&gt;Hi @John Constantine​&amp;nbsp; for delta caching you can refer to the below doc link.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/delta/optimizations/delta-cache.html" alt="https://docs.databricks.com/delta/optimizations/delta-cache.html" target="_blank"&gt;https://docs.databricks.com/delta/optimizations/delta-cache.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 15 Nov 2021 17:33:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-does-spark-do-lazy-evaluation/m-p/35128#M25801</guid>
      <dc:creator>Prabakar</dc:creator>
      <dc:date>2021-11-15T17:33:23Z</dc:date>
    </item>
    <item>
      <title>Re: How does Spark do lazy evaluation?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-does-spark-do-lazy-evaluation/m-p/35129#M25802</link>
      <description>&lt;P&gt;Hi @John Constantine​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The following notebook &lt;A href="https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/346304/2168141618055043/484361/latest.html" alt="https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/346304/2168141618055043/484361/latest.html" target="_blank"&gt;url&lt;/A&gt; will help you to undertand better the difference between lazy transformations and action in Spark. You will be able to compare the physical query plans and undertand better what is going on when you execute your SQL statements. &lt;/P&gt;</description>
      <pubDate>Mon, 15 Nov 2021 19:08:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-does-spark-do-lazy-evaluation/m-p/35129#M25802</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2021-11-15T19:08:43Z</dc:date>
    </item>
  </channel>
</rss>

