cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How does Spark do lazy evaluation?

Constantine
Contributor III

For context, I am running Spark on databricks platform and using Delta Tables (s3).

Let's assume we a table called table_one. I create a view called view_one using the table and then call view_one. Next, I create another view, called view_two based on view_one and then call view_two. Will all the calculations be done again for view_one.

Example commands are below i.e. when cmd4 is called, will cmd1 be re-executed to calculate cmd4?

Cmd1:

CREATE OR REPLACE VIEW_ONE FROM 
SELECT 
     ....
FROM
    table_one
WHERE
   .....

Cmd2:

SELECT * FROM VIEW_ONE; 

Cmd3:

CREATE OR REPLACE VIEW VIEW_TWO AS
SELECT 
    ....
FROM 
  VIEW_ONE
WHERE
 .....;

Cmd4:

SELECT * FROM VIEW_TWO; 

1 ACCEPTED SOLUTION

Accepted Solutions

jose_gonzalez
Moderator
Moderator

Hi @John Constantine​ ,

The following notebook url will help you to undertand better the difference between lazy transformations and action in Spark. You will be able to compare the physical query plans and undertand better what is going on when you execute your SQL statements.

View solution in original post

4 REPLIES 4

Anonymous
Not applicable

Hello @John Constantine​! My name is Piper and I'm a community moderator for Databricks. Welcome to the community and thank you for your question! Let's give it a while to see what other members have to say. 🙂

-werners-
Esteemed Contributor III

short answer: yes. Spark will run view_one twice.

Unless you cache it (by using delta cache or persist()/cache()).

Prabakar
Esteemed Contributor III
Esteemed Contributor III

Hi @John Constantine​  for delta caching you can refer to the below doc link.

https://docs.databricks.com/delta/optimizations/delta-cache.html

jose_gonzalez
Moderator
Moderator

Hi @John Constantine​ ,

The following notebook url will help you to undertand better the difference between lazy transformations and action in Spark. You will be able to compare the physical query plans and undertand better what is going on when you execute your SQL statements.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!