Hello @Travis84 ,
Below are the answers to your questions:
Where to put the hint? On either one of the two relations that participate in the range join for that specific join block. In simple two-table queries, it doesnโt matter. In multi-join queries, you must attach it to the relation involved in the range join at that stage (SQL is left-associative, so (a JOIN b) JOIN c: a hint on a affects a JOIN b, not the later join to c).
Where in the SQL does it go? As a SELECT hint, immediately after SELECT, e.g. SELECT /*+ RANGE_JOIN(points, 60) */ โฆ
In the SQL above, your condition is r.ts BETWEEN p.ts - INTERVAL 1 MINUTE AND p.ts.
For the range-join optimisation on a LEFT join, Databricks requires the point value to be on the left side of the join. You can rewrite the predicate equivalently as:
p.ts BETWEEN r.ts AND r.ts + INTERVAL 1 MINUTE and then apply the hint.
Example:
SELECT /*+ RANGE_JOIN(p, 60) */
p.id, p.ts, p.value,
r.metric1, r.metric2, r.ts AS range_ts
FROM points p
LEFT JOIN ranges r
ON r.point_id = p.id
AND p.ts BETWEEN r.ts AND r.ts + INTERVAL 1 MINUTE
QUALIFY row_number() OVER (PARTITION BY p.id, p.ts ORDER BY r.ts DESC) = 1;