cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

When to persist and when to unpersist RDD in Spark

paourissi
New Contributor

Lets say i have the following:

<code>val dataset2 = dataset1.persist(StorageLevel.MEMORY_AND_DISK)

val dataset3 = dataset2.map(.....)1)

1)If you do a transformation on the dataset2 then you have to persist it and pass it to dataset3 and unpersist the previous or not?

2)I am trying to figure out when to persist and unpersist RDDs. With every new rdd that is created do i have to persist it?

3)In order for an unpersist to take place, an action must be following?(e.x otherrdd.count)

Thanks

2 REPLIES 2

Arun_KumarPT
New Contributor II

This doesn't answer any of the questions asked. This question is about unpersisting a data frame. The linked docs only say that it can be done, but doesn't give any hints as to when it should be done. My worry is that unpersisting too soon will lead to zero cache benefits.

I assume that you should wait until the after last force evaluation, but it's not documented and it's hard to reason about given that cache/unpersist are mutating.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.