Databricks Community

Prototype998 · ‎12-27-2022

best situations where we can use broadcast variables ?

Rishabh-Pandey · ‎12-27-2022

hey @Punit Chauhan

BV are used in the same way for RDD, DataFrame, and Dataset.

When you run a Spark RDD, DataFrame jobs that has the Broadcast variables defined and used, Spark does the following.

Spark breaks the job into stages that have distributed shuffling and actions are executed with in the stage.
Later Stages are also broken into tasks
Spark BV the common data (reusable) needed by tasks within each stage.
The BV data is cache in serialized format and deserialized before executing each task.

Rishabh Pandey

Rishabh-Pandey · ‎12-27-2022

hey @Punit Chauhan

BV are used in the same way for RDD, DataFrame, and Dataset.

When you run a Spark RDD, DataFrame jobs that has the Broadcast variables defined and used, Spark does the following.

Spark breaks the job into stages that have distributed shuffling and actions are executed with in the stage.
Later Stages are also broken into tasks
Spark BV the common data (reusable) needed by tasks within each stage.
The BV data is cache in serialized format and deserialized before executing each task.

Rishabh Pandey