<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Which table should i use for a range join hint? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/which-table-should-i-use-for-a-range-join-hint/m-p/132896#M49669</link>
    <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/181042"&gt;@Travis84&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;Below are the answers to your questions:&lt;/P&gt;
&lt;P&gt;Where to put the hint? On either one of the two relations that participate in the range join for that specific join block. In simple two-table queries, it doesn’t matter. In multi-join queries, you must attach it to the relation involved in the range join at that stage (SQL is left-associative, so (a JOIN b) JOIN c: a hint on a affects a JOIN b, not the later join to c).&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Where in the SQL does it go?&lt;/STRONG&gt; As a SELECT hint, immediately after SELECT, e.g. SELECT /*+ RANGE_JOIN(points, 60) */ …&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;In the SQL above, your condition is &lt;STRONG&gt;r.ts BETWEEN p.ts - INTERVAL 1 MINUTE AND p.ts.&lt;/STRONG&gt;&lt;BR /&gt;For the range-join optimisation on a LEFT join, Databricks requires the point value to be on the left side of the join. You can rewrite the predicate equivalently as:&lt;BR /&gt;&lt;STRONG style="font-family: inherit;"&gt;p.ts BETWEEN r.ts AND r.ts + INTERVAL 1 MINUTE&amp;nbsp;&lt;/STRONG&gt;&lt;SPAN&gt;and then a&lt;/SPAN&gt;&lt;SPAN&gt;pply the hint&lt;/SPAN&gt;&lt;SPAN&gt;. &lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;Example:&lt;BR /&gt;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;SELECT /*+ RANGE_JOIN(p, 60) */
p.id, p.ts, p.value,
r.metric1, r.metric2, r.ts AS range_ts
FROM points p
LEFT JOIN ranges r
ON r.point_id = p.id
AND p.ts BETWEEN r.ts AND r.ts + INTERVAL 1 MINUTE
QUALIFY row_number() OVER (PARTITION BY p.id, p.ts ORDER BY r.ts DESC) = 1;&lt;/LI-CODE&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;DIV class="contain-inline-size rounded-2xl relative bg-token-sidebar-surface-primary"&gt;
&lt;DIV class="overflow-y-auto p-4" dir="ltr"&gt;&lt;STRONG&gt;Or&lt;/STRONG&gt;&lt;/DIV&gt;
&lt;DIV class="overflow-y-auto p-4" dir="ltr"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="overflow-y-auto p-4" dir="ltr"&gt;&lt;LI-CODE lang="python"&gt;LEFT JOIN LATERAL (
SELECT /*+ RANGE_JOIN(r, 60) */ r.metric1, r.metric2, r.ts
FROM ranges r
WHERE r.point_id = p.id
AND p.ts BETWEEN r.ts AND r.ts + INTERVAL 1 MINUTE
ORDER BY r.ts DESC
LIMIT 1
) rg&lt;/LI-CODE&gt;&lt;/DIV&gt;
&lt;/DIV&gt;</description>
    <pubDate>Wed, 24 Sep 2025 05:42:19 GMT</pubDate>
    <dc:creator>K_Anudeep</dc:creator>
    <dc:date>2025-09-24T05:42:19Z</dc:date>
    <item>
      <title>Which table should i use for a range join hint?</title>
      <link>https://community.databricks.com/t5/data-engineering/which-table-should-i-use-for-a-range-join-hint/m-p/132894#M49668</link>
      <description>&lt;P&gt;I am a bit confused about how to use range join hints. Consider the following query&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;```&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;SELECT&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; p.id,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; p.ts,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; p.value,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; rg.metric1,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; rg.metric2,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; rg.ts &lt;/SPAN&gt;&lt;SPAN&gt;AS&lt;/SPAN&gt;&lt;SPAN&gt; range_ts&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;FROM&lt;/SPAN&gt;&lt;SPAN&gt; points p&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;LEFT JOIN&lt;/SPAN&gt;&lt;SPAN&gt; LATERAL (&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;SELECT&lt;/SPAN&gt;&lt;SPAN&gt; r.metric1, r.metric2, r.ts&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;FROM&lt;/SPAN&gt;&lt;SPAN&gt; ranges r&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;WHERE&lt;/SPAN&gt;&lt;SPAN&gt; r.point_id &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; p.id&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;AND&lt;/SPAN&gt;&lt;SPAN&gt; r.ts &lt;/SPAN&gt;&lt;SPAN&gt;BETWEEN&lt;/SPAN&gt;&lt;SPAN&gt; p.ts &lt;/SPAN&gt;&lt;SPAN&gt;-&lt;/SPAN&gt;&lt;SPAN&gt; INTERVAL &lt;/SPAN&gt;&lt;SPAN&gt;1&lt;/SPAN&gt; &lt;SPAN&gt;MINUTE&lt;/SPAN&gt; &lt;SPAN&gt;AND&lt;/SPAN&gt;&lt;SPAN&gt; p.ts&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;ORDER BY&lt;/SPAN&gt;&lt;SPAN&gt; r.ts &lt;/SPAN&gt;&lt;SPAN&gt;DESC&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;LIMIT&lt;/SPAN&gt; &lt;SPAN&gt;1&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;) rg&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;```&lt;/DIV&gt;&lt;DIV&gt;should the hint be applies on points or on ranges?&lt;/DIV&gt;&lt;DIV&gt;does the hint go immediately after the first select statement?&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;In the documentation (&lt;A href="https://docs.databricks.com/aws/en/optimizations/range-join#enable-range-join-using-a-range-join-hint" target="_blank" rel="noopener"&gt;Range join optimization | Databricks on AWS&lt;/A&gt;) the first example shows the hint applying to the points table, the 3rd example shows the hint applying to the ranges like table.&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 24 Sep 2025 05:14:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/which-table-should-i-use-for-a-range-join-hint/m-p/132894#M49668</guid>
      <dc:creator>Travis84</dc:creator>
      <dc:date>2025-09-24T05:14:09Z</dc:date>
    </item>
    <item>
      <title>Re: Which table should i use for a range join hint?</title>
      <link>https://community.databricks.com/t5/data-engineering/which-table-should-i-use-for-a-range-join-hint/m-p/132896#M49669</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/181042"&gt;@Travis84&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;Below are the answers to your questions:&lt;/P&gt;
&lt;P&gt;Where to put the hint? On either one of the two relations that participate in the range join for that specific join block. In simple two-table queries, it doesn’t matter. In multi-join queries, you must attach it to the relation involved in the range join at that stage (SQL is left-associative, so (a JOIN b) JOIN c: a hint on a affects a JOIN b, not the later join to c).&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Where in the SQL does it go?&lt;/STRONG&gt; As a SELECT hint, immediately after SELECT, e.g. SELECT /*+ RANGE_JOIN(points, 60) */ …&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;In the SQL above, your condition is &lt;STRONG&gt;r.ts BETWEEN p.ts - INTERVAL 1 MINUTE AND p.ts.&lt;/STRONG&gt;&lt;BR /&gt;For the range-join optimisation on a LEFT join, Databricks requires the point value to be on the left side of the join. You can rewrite the predicate equivalently as:&lt;BR /&gt;&lt;STRONG style="font-family: inherit;"&gt;p.ts BETWEEN r.ts AND r.ts + INTERVAL 1 MINUTE&amp;nbsp;&lt;/STRONG&gt;&lt;SPAN&gt;and then a&lt;/SPAN&gt;&lt;SPAN&gt;pply the hint&lt;/SPAN&gt;&lt;SPAN&gt;. &lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;Example:&lt;BR /&gt;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;SELECT /*+ RANGE_JOIN(p, 60) */
p.id, p.ts, p.value,
r.metric1, r.metric2, r.ts AS range_ts
FROM points p
LEFT JOIN ranges r
ON r.point_id = p.id
AND p.ts BETWEEN r.ts AND r.ts + INTERVAL 1 MINUTE
QUALIFY row_number() OVER (PARTITION BY p.id, p.ts ORDER BY r.ts DESC) = 1;&lt;/LI-CODE&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;DIV class="contain-inline-size rounded-2xl relative bg-token-sidebar-surface-primary"&gt;
&lt;DIV class="overflow-y-auto p-4" dir="ltr"&gt;&lt;STRONG&gt;Or&lt;/STRONG&gt;&lt;/DIV&gt;
&lt;DIV class="overflow-y-auto p-4" dir="ltr"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="overflow-y-auto p-4" dir="ltr"&gt;&lt;LI-CODE lang="python"&gt;LEFT JOIN LATERAL (
SELECT /*+ RANGE_JOIN(r, 60) */ r.metric1, r.metric2, r.ts
FROM ranges r
WHERE r.point_id = p.id
AND p.ts BETWEEN r.ts AND r.ts + INTERVAL 1 MINUTE
ORDER BY r.ts DESC
LIMIT 1
) rg&lt;/LI-CODE&gt;&lt;/DIV&gt;
&lt;/DIV&gt;</description>
      <pubDate>Wed, 24 Sep 2025 05:42:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/which-table-should-i-use-for-a-range-join-hint/m-p/132896#M49669</guid>
      <dc:creator>K_Anudeep</dc:creator>
      <dc:date>2025-09-24T05:42:19Z</dc:date>
    </item>
  </channel>
</rss>

