hey @Punit Chauhanโ
BV are used in the same way for RDD, DataFrame, and Dataset.
When you run a Spark RDD, DataFrame jobs that has the Broadcast variables defined and used, Spark does the following.
- Spark breaks the job into stages that have distributed shuffling and actions are executed with in the stage.
- Later Stages are also broken into tasks
- Spark BV the common data (reusable) needed by tasks within each stage.
- The BV data is cache in serialized format and deserialized before executing each task.
Rishabh Pandey