Srikanth_Gupta_
Databricks Employee
Databricks Employee

Better to use cache when dataframe is used multiple times in a single pipeline.

Using  cache()  and persist()  methods, Spark provides an optimization mechanism to store the intermediate computation of a Spark DataFrame so they can be reused in subsequent actions.