Rishabh-Pandey
Databricks MVP

hey @Punit Chauhan​ 

BV are used in the same way for RDD, DataFrame, and Dataset.

When you run a Spark RDD, DataFrame jobs that has the Broadcast variables defined and used, Spark does the following.

  • Spark breaks the job into stages that have distributed shuffling and actions are executed with in the stage.
  • Later Stages are also broken into tasks
  • Spark BV the common data (reusable) needed by tasks within each stage.
  • The BV data is cache in serialized format and deserialized before executing each task.

Rishabh Pandey

View solution in original post