Databricks Community

smoortema · 10 hours ago

What is the best way to know what kind of join was used for a SQL query between broadcast, shuffle hash and sort merge? How can the spark UI or the query plan be interpreted?

Louis_Frolio · 9 hours ago

Hello @smoortema , here are some helpful tips and tricks.

Here’s how to quickly determine which join strategy Spark used—between broadcast hash join, shuffle hash join, and sort-merge join—and how to read both the query plan and the Spark UI to verify it.

Quick answers

The easiest way: run SQL EXPLAIN or DataFrame.explain to see the initial physical plan; look for operator names like BroadcastHashJoin, ShuffleHashJoin, or SortMergeJoin in the plan output.
To see what was actually executed (especially with AQE enabled), use the Spark UI’s SQL tab. The diagram shows the current/final plan; join nodes are labeled and include metrics (rows output, shuffle read/write, broadcast size).
With AQE, the initial plan shown by EXPLAIN may differ from the executed plan; the Spark UI reflects dynamic changes (e.g., SMJ converted to BHJ at runtime).

How to tell from the query plan

Use SQL EXPLAIN or DataFrame.explain to inspect the physical plan before execution; scan for join nodes: * BroadcastHashJoin → broadcast hash join. * ShuffleHashJoin → shuffle hash join. * SortMergeJoin → sort-merge join.
EXPLAIN always shows the initial plan and does not reflect AQE re-optimizations; compare EXPLAIN output with the Spark UI to see if AQE changed the join at runtime.
In Databricks, AQE can dynamically change a planned sort-merge join into a broadcast hash join if a join side is under the adaptive broadcast threshold (default 30MB). Look for different join nodes between initial and current/final plans to confirm the change.
If using Photon, you may see Photon-specific operators (e.g., PhotonBroadcastHashJoin); this indicates Photon executed that part of the plan.

How to tell from the Spark UI

Open the SQL tab → select the query → view the DAG/plan diagram. Join operators are labeled directly: * BroadcastHashJoin for BHJ. * ShuffleHashJoin for SHJ. * SortMergeJoin for SMJ.
Hover or expand join nodes to see metrics:
- Rows output can reveal “row explosion” (unexpectedly high output cardinality).
- Shuffle read/write shows how much data moved for SHJ/SMJ.
- Broadcast size appears for BHJ stages and helps confirm broadcast happened.
With AQE, the plan diagram can evolve during execution; the Spark UI shows the current/final executed plan, not the initial plan. Use it to verify runtime strategy changes (e.g., SMJ → BHJ) and optimizations like partition coalescing or skew handling via CustomShuffleReader annotations (coalesced/skewed).

Notes about AQE (Adaptive Query Execution)

AQE may switch sort-merge join to broadcast hash join at runtime based on accurate post-shuffle statistics; the threshold for dynamic switch is spark.databricks.adaptive.autoBroadcastJoinThreshold (default 30MB).
EXPLAIN does not execute the query, so it shows the initial plan only; the Spark UI shows the plan as it evolves and the final executed plan, making it the authoritative source for what actually ran under AQE.
AQE also handles skew in SMJ/SHJ by splitting skewed partitions; you’ll see indicators like SortMergeJoin with isSkew=true and CustomShuffleReader with skewed in the plan/UI.

Forcing or controlling join types (when needed)

Use join hints to request a strategy:
- BROADCAST(table) → broadcast hash join.
- MERGE(table) or SHUFFLE_MERGE(table) → sort-merge join.
- SHUFFLE_HASH(table) → shuffle hash join.
- Spark prioritizes hints: BROADCAST over MERGE over SHUFFLE_HASH over SHUFFLE_REPLICATE_NL; not all strategies support all join types.
Key configs:
- spark.sql.autoBroadcastJoinThreshold controls static broadcast planning (Spark typically broadcasts small tables by default; many workloads start around 10MB, adjustable).
- spark.databricks.adaptive.autoBroadcastJoinThreshold controls AQE’s dynamic switch to BHJ at runtime (default 30MB).
- spark.sql.join.preferSortMergeJoin (true by default) can be set to false to prefer SHJ where feasible; Photon similarly tends to favor SHJ to speed up queries.
Even with AQE enabled, broadcast hints can still outperform a dynamic conversion because AQE may only decide to broadcast after both sides shuffle; hints avoid that shuffle upfront if you know a side is small.

Practical checklist

Before running:
- EXPLAIN your query; confirm the planned join node names match expectations (BHJ/SHJ/SMJ).
After running:
- Spark UI → SQL tab → check the join node label and metrics to see what actually executed and whether AQE changed it.
If the executed plan isn’t the one you want:
- Consider adding a join hint or adjusting configs (autoBroadcastJoinThreshold, preferSortMergeJoin) and rerun; verify again in EXPLAIN and the Spark UI.

Useful references

Adaptive Query Execution user guide (plans, Spark UI behavior, configs, dynamic BHJ conversion).
Join hints syntax and priority (BROADCST/MERGE/SHUFFLE_HASH/SHUFFLE_REPLICATE_NL).
Best practices for choosing BHJ vs SMJ vs SHJ and reading Spark UI join metrics.
AQE blog posts (identifying strategy changes and CustomShuffleReader coalesce/skew indicators).

Hope this helps, Louis.