Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-27-2025 10:10 PM
What Spark Does During a Broadcast Join-
- Spark identifies the smaller table (say 80MB).
- The driver collects this small table to a single JVM.
- The driver serializes the table into a broadcast variable.
- The broadcast variable is shipped to all executors.
- Executors store it inside the BlockManager storage region.
- Each executor loads it into memory to build a hash map for fast joining.
Yogesh Verma