Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ12-27-2022 09:53 PM
1 ACCEPTED SOLUTION
Accepted Solutions
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ12-27-2022 11:50 PM
hey @Punit Chauhanโ
BV are used in the same way for RDD, DataFrame, and Dataset.
When you run a Spark RDD, DataFrame jobs that has the Broadcast variables defined and used, Spark does the following.
- Spark breaks the job into stages that have distributed shuffling and actions are executed with in the stage.
- Later Stages are also broken into tasks
- Spark BV the common data (reusable) needed by tasks within each stage.
- The BV data is cache in serialized format and deserialized before executing each task.
Rishabh Pandey
1 REPLY 1
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ12-27-2022 11:50 PM
hey @Punit Chauhanโ
BV are used in the same way for RDD, DataFrame, and Dataset.
When you run a Spark RDD, DataFrame jobs that has the Broadcast variables defined and used, Spark does the following.
- Spark breaks the job into stages that have distributed shuffling and actions are executed with in the stage.
- Later Stages are also broken into tasks
- Spark BV the common data (reusable) needed by tasks within each stage.
- The BV data is cache in serialized format and deserialized before executing each task.
Rishabh Pandey