cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

When to use cache vs checkpoint?

User16752240150
New Contributor II

I've seen .cache() and .checkpoint() used similarly in some workflows I've come across. What's the difference, and when should I use one over the other?

1 REPLY 1

Srikanth_Gupta_
Valued Contributor

Caching is extremely useful than checkpointing when you have lot of available memory to store your RDD or Dataframes if they are massive.

Caching will maintain the result of your transformations so that those transformations will not have to be recomputed again when additional transformations is applied on RDD or Dataframe, when you apply Caching Spark stores history of transformations applied and re compute them in case of insufficient memory, but when you apply checkpointing spark throws away all of your transformations and stores finally Dataframe into HDFS forever. the main problem of checkpointing is to store the data into HDFS which is slower than caching. you also need to setup checkpointing location on HDFS. persist(StorageLevel.DISK_ONLY) also has does similar thing but it stores history of your transformations. Checkpointing is mainly used in stateful transformation that combine data across multiple batches. In such transformations, the generated RDDs depend on RDDs of previous batches, which causes the length of the dependency chain to keep increasing with time. To avoid such unbounded increases in recovery time 

checkpointing is also used in streaming application to store meta data to recover from failures.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group