cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Spark behavior while dealing with Actions & Transformations ?

Mradul07
New Contributor II

Hi, 

My question is - what happens to the initial RDD after the action is performed on it. Does it disappear or stays in the memory or does it needs to be explicitly cached() if we want to use it again.

For eg : If I execute this in a sequence :

df_output= df_input.filter(...) -- > transformation_1  

df_output.count() -- > Action_1

df_final = df_output.filter(...) --> Transformation_2

df_final.count() -- > Action_2

While executing Action_2, does Transformation_1 & 2 are both performed again or only the Transformation_2 (if this is the case where is the result of Transformation_1 stored meanwhile) ?

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group