cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How does Spark do lazy evaluation?

Constantine
Contributor III

For context, I am running Spark on databricks platform and using Delta Tables (s3).

Let's assume we a table called table_one. I create a view called view_one using the table and then call view_one. Next, I create another view, called view_two based on view_one and then call view_two. Will all the calculations be done again for view_one.

Example commands are below i.e. when cmd4 is called, will cmd1 be re-executed to calculate cmd4?

Cmd1:

CREATE OR REPLACE VIEW_ONE FROM 
SELECT 
     ....
FROM
    table_one
WHERE
   .....

Cmd2:

SELECT * FROM VIEW_ONE; 

Cmd3:

CREATE OR REPLACE VIEW VIEW_TWO AS
SELECT 
    ....
FROM 
  VIEW_ONE
WHERE
 .....;

Cmd4:

SELECT * FROM VIEW_TWO; 

1 ACCEPTED SOLUTION

Accepted Solutions

jose_gonzalez
Moderator
Moderator

Hi @John Constantine​ ,

The following notebook url will help you to undertand better the difference between lazy transformations and action in Spark. You will be able to compare the physical query plans and undertand better what is going on when you execute your SQL statements.

View solution in original post

4 REPLIES 4

Anonymous
Not applicable

Hello @John Constantine​! My name is Piper and I'm a community moderator for Databricks. Welcome to the community and thank you for your question! Let's give it a while to see what other members have to say. 🙂

-werners-
Esteemed Contributor III

short answer: yes. Spark will run view_one twice.

Unless you cache it (by using delta cache or persist()/cache()).

Prabakar
Esteemed Contributor III
Esteemed Contributor III

Hi @John Constantine​  for delta caching you can refer to the below doc link.

https://docs.databricks.com/delta/optimizations/delta-cache.html

jose_gonzalez
Moderator
Moderator

Hi @John Constantine​ ,

The following notebook url will help you to undertand better the difference between lazy transformations and action in Spark. You will be able to compare the physical query plans and undertand better what is going on when you execute your SQL statements.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.