cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How does Spark do lazy evaluation?

Constantine
Contributor III

For context, I am running Spark on databricks platform and using Delta Tables (s3).

Let's assume we a table called table_one. I create a view called view_one using the table and then call view_one. Next, I create another view, called view_two based on view_one and then call view_two. Will all the calculations be done again for view_one.

Example commands are below i.e. when cmd4 is called, will cmd1 be re-executed to calculate cmd4?

Cmd1:

CREATE OR REPLACE VIEW_ONE FROM 
SELECT 
     ....
FROM
    table_one
WHERE
   .....

Cmd2:

SELECT * FROM VIEW_ONE; 

Cmd3:

CREATE OR REPLACE VIEW VIEW_TWO AS
SELECT 
    ....
FROM 
  VIEW_ONE
WHERE
 .....;

Cmd4:

SELECT * FROM VIEW_TWO; 

1 ACCEPTED SOLUTION

Accepted Solutions

jose_gonzalez
Databricks Employee
Databricks Employee

Hi @John Constantineโ€‹ ,

The following notebook url will help you to undertand better the difference between lazy transformations and action in Spark. You will be able to compare the physical query plans and undertand better what is going on when you execute your SQL statements.

View solution in original post

4 REPLIES 4

Anonymous
Not applicable

Hello @John Constantineโ€‹! My name is Piper and I'm a community moderator for Databricks. Welcome to the community and thank you for your question! Let's give it a while to see what other members have to say. ๐Ÿ™‚

-werners-
Esteemed Contributor III

short answer: yes. Spark will run view_one twice.

Unless you cache it (by using delta cache or persist()/cache()).

Prabakar
Databricks Employee
Databricks Employee

Hi @John Constantineโ€‹  for delta caching you can refer to the below doc link.

https://docs.databricks.com/delta/optimizations/delta-cache.html

jose_gonzalez
Databricks Employee
Databricks Employee

Hi @John Constantineโ€‹ ,

The following notebook url will help you to undertand better the difference between lazy transformations and action in Spark. You will be able to compare the physical query plans and undertand better what is going on when you execute your SQL statements.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group