Yogesh_Verma_
Contributor II

What Spark Does During a Broadcast Join-

  • Spark identifies the smaller table (say 80MB).
  • The driver collects this small table to a single JVM.
  • The driver serializes the table into a broadcast variable.
  • The broadcast variable is shipped to all executors.
  • Executors store it inside the BlockManager storage region.
  • Each executor loads it into memory to build a hash map for fast joining.
Yogesh Verma