Re: Do I have to run .cache() on my dataframe befo...

Srikanth_Gupta_ · ‎06-17-2021

Better to use cache when dataframe is used multiple times in a single pipeline.

Using cache() and persist() methods, Spark provides an optimization mechanism to store the intermediate computation of a Spark DataFrame so they can be reused in subsequent actions.