Which table should i use for a range join hint?

Travis84
New Contributor II

I am a bit confused about how to use range join hints. Consider the following query

 

 
```
SELECT
  p.id,
  p.ts,
  p.value,
  rg.metric1,
  rg.metric2,
  rg.ts AS range_ts
FROM points p
LEFT JOIN LATERAL (
  SELECT r.metric1, r.metric2, r.ts
  FROM ranges r
  WHERE r.point_id = p.id
    AND r.ts BETWEEN p.ts - INTERVAL 1 MINUTE AND p.ts
  ORDER BY r.ts DESC
  LIMIT 1
) rg
```
should the hint be applies on points or on ranges?
does the hint go immediately after the first select statement?
 
In the documentation (Range join optimization | Databricks on AWS) the first example shows the hint applying to the points table, the 3rd example shows the hint applying to the ranges like table.

K_Anudeep
Databricks Employee
Databricks Employee

Hello @Travis84 ,

Below are the answers to your questions:

Where to put the hint? On either one of the two relations that participate in the range join for that specific join block. In simple two-table queries, it doesn’t matter. In multi-join queries, you must attach it to the relation involved in the range join at that stage (SQL is left-associative, so (a JOIN b) JOIN c: a hint on a affects a JOIN b, not the later join to c).

Where in the SQL does it go? As a SELECT hint, immediately after SELECT, e.g. SELECT /*+ RANGE_JOIN(points, 60) */ …

In the SQL above, your condition is r.ts BETWEEN p.ts - INTERVAL 1 MINUTE AND p.ts.
For the range-join optimisation on a LEFT join, Databricks requires the point value to be on the left side of the join. You can rewrite the predicate equivalently as:
p.ts BETWEEN r.ts AND r.ts + INTERVAL 1 MINUTE and then apply the hint.

Example:

SELECT /*+ RANGE_JOIN(p, 60) */
p.id, p.ts, p.value,
r.metric1, r.metric2, r.ts AS range_ts
FROM points p
LEFT JOIN ranges r
ON r.point_id = p.id
AND p.ts BETWEEN r.ts AND r.ts + INTERVAL 1 MINUTE
QUALIFY row_number() OVER (PARTITION BY p.id, p.ts ORDER BY r.ts DESC) = 1;

 

Or
 
LEFT JOIN LATERAL (
SELECT /*+ RANGE_JOIN(r, 60) */ r.metric1, r.metric2, r.ts
FROM ranges r
WHERE r.point_id = p.id
AND p.ts BETWEEN r.ts AND r.ts + INTERVAL 1 MINUTE
ORDER BY r.ts DESC
LIMIT 1
) rg
Anudeep