Hello @smoortema , here are some helpful tips and tricks.
Here’s how to quickly determine which join strategy Spark used—between broadcast hash join, shuffle hash join, and sort-merge join—and how to read both the query plan and the Spark UI to verify it.
Quick answers
- The easiest way: run SQL EXPLAIN or DataFrame.explain to see the initial physical plan; look for operator names like BroadcastHashJoin, ShuffleHashJoin, or SortMergeJoin in the plan output.
-
To see what was actually executed (especially with AQE enabled), use the Spark UI’s SQL tab. The diagram shows the current/final plan; join nodes are labeled and include metrics (rows output, shuffle read/write, broadcast size).
-
With AQE, the initial plan shown by EXPLAIN may differ from the executed plan; the Spark UI reflects dynamic changes (e.g., SMJ converted to BHJ at runtime).
How to tell from the query plan
- Use SQL EXPLAIN or DataFrame.explain to inspect the physical plan before execution; scan for join nodes: * BroadcastHashJoin → broadcast hash join. * ShuffleHashJoin → shuffle hash join. * SortMergeJoin → sort-merge join.
-
EXPLAIN always shows the initial plan and does not reflect AQE re-optimizations; compare EXPLAIN output with the Spark UI to see if AQE changed the join at runtime.
-
In Databricks, AQE can dynamically change a planned sort-merge join into a broadcast hash join if a join side is under the adaptive broadcast threshold (default 30MB). Look for different join nodes between initial and current/final plans to confirm the change.
-
If using Photon, you may see Photon-specific operators (e.g., PhotonBroadcastHashJoin); this indicates Photon executed that part of the plan.
How to tell from the Spark UI
- Open the SQL tab → select the query → view the DAG/plan diagram. Join operators are labeled directly: * BroadcastHashJoin for BHJ. * ShuffleHashJoin for SHJ. * SortMergeJoin for SMJ.
-
Hover or expand join nodes to see metrics:
- Rows output can reveal “row explosion” (unexpectedly high output cardinality).
- Shuffle read/write shows how much data moved for SHJ/SMJ.
- Broadcast size appears for BHJ stages and helps confirm broadcast happened.
-
With AQE, the plan diagram can evolve during execution; the Spark UI shows the current/final executed plan, not the initial plan. Use it to verify runtime strategy changes (e.g., SMJ → BHJ) and optimizations like partition coalescing or skew handling via CustomShuffleReader annotations (coalesced/skewed).
Notes about AQE (Adaptive Query Execution)
- AQE may switch sort-merge join to broadcast hash join at runtime based on accurate post-shuffle statistics; the threshold for dynamic switch is spark.databricks.adaptive.autoBroadcastJoinThreshold (default 30MB).
-
EXPLAIN does not execute the query, so it shows the initial plan only; the Spark UI shows the plan as it evolves and the final executed plan, making it the authoritative source for what actually ran under AQE.
-
AQE also handles skew in SMJ/SHJ by splitting skewed partitions; you’ll see indicators like SortMergeJoin with isSkew=true and CustomShuffleReader with skewed in the plan/UI.
Forcing or controlling join types (when needed)
Practical checklist
- Before running:
- EXPLAIN your query; confirm the planned join node names match expectations (BHJ/SHJ/SMJ).
-
After running:
- Spark UI → SQL tab → check the join node label and metrics to see what actually executed and whether AQE changed it.
-
If the executed plan isn’t the one you want:
- Consider adding a join hint or adjusting configs (autoBroadcastJoinThreshold, preferSortMergeJoin) and rerun; verify again in EXPLAIN and the Spark UI.
Useful references
- Adaptive Query Execution user guide (plans, Spark UI behavior, configs, dynamic BHJ conversion).
- Join hints syntax and priority (BROADCST/MERGE/SHUFFLE_HASH/SHUFFLE_REPLICATE_NL).
- Best practices for choosing BHJ vs SMJ vs SHJ and reading Spark UI join metrics.
- AQE blog posts (identifying strategy changes and CustomShuffleReader coalesce/skew indicators).
Hope this helps, Louis.