cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Where can we use Broadcast variable?

Prototype998
New Contributor III

best situations where we can use broadcast variables ?

1 ACCEPTED SOLUTION

Accepted Solutions

Rishabh264
Honored Contributor II

hey @Punit Chauhanโ€‹ 

BV are used in the same way for RDD, DataFrame, and Dataset.

When you run a Spark RDD, DataFrame jobs that has the Broadcast variables defined and used, Spark does the following.

  • Spark breaks the job into stages that have distributed shuffling and actions are executed with in the stage.
  • Later Stages are also broken into tasks
  • Spark BV the common data (reusable) needed by tasks within each stage.
  • The BV data is cache in serialized format and deserialized before executing each task.

View solution in original post

1 REPLY 1

Rishabh264
Honored Contributor II

hey @Punit Chauhanโ€‹ 

BV are used in the same way for RDD, DataFrame, and Dataset.

When you run a Spark RDD, DataFrame jobs that has the Broadcast variables defined and used, Spark does the following.

  • Spark breaks the job into stages that have distributed shuffling and actions are executed with in the stage.
  • Later Stages are also broken into tasks
  • Spark BV the common data (reusable) needed by tasks within each stage.
  • The BV data is cache in serialized format and deserialized before executing each task.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.