Re: Where can we use Broadcast variable?

Rishabh-Pandey · ‎12-27-2022

hey @Punit Chauhan

BV are used in the same way for RDD, DataFrame, and Dataset.

When you run a Spark RDD, DataFrame jobs that has the Broadcast variables defined and used, Spark does the following.

Spark breaks the job into stages that have distributed shuffling and actions are executed with in the stage.
Later Stages are also broken into tasks
Spark BV the common data (reusable) needed by tasks within each stage.
The BV data is cache in serialized format and deserialized before executing each task.

Rishabh Pandey