cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

When to persist and when to unpersist RDD in Spark

paourissi
New Contributor

Lets say i have the following:

<code>val dataset2 = dataset1.persist(StorageLevel.MEMORY_AND_DISK)

val dataset3 = dataset2.map(.....)1)

1)If you do a transformation on the dataset2 then you have to persist it and pass it to dataset3 and unpersist the previous or not?

2)I am trying to figure out when to persist and unpersist RDDs. With every new rdd that is created do i have to persist it?

3)In order for an unpersist to take place, an action must be following?(e.x otherrdd.count)

Thanks

2 REPLIES 2

Arun_KumarPT
New Contributor II

This doesn't answer any of the questions asked. This question is about unpersisting a data frame. The linked docs only say that it can be done, but doesn't give any hints as to when it should be done. My worry is that unpersisting too soon will lead to zero cache benefits.

I assume that you should wait until the after last force evaluation, but it's not documented and it's hard to reason about given that cache/unpersist are mutating.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group