<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Doubt with range_join hints optimization, using INSERT INTO REPLACE WHERE in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/doubt-with-range-join-hints-optimization-using-insert-into/m-p/122636#M46833</link>
    <description>&lt;P&gt;Hello&lt;/P&gt;&lt;P&gt;Im optmizing a big notebook and have encountered many times the tip from databricks that says "&lt;SPAN&gt;Unused range join hints&lt;/SPAN&gt;". Reading the&amp;nbsp;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/optimizations/range-join" target="_self"&gt;documentation&lt;/A&gt; for reference, I have been able to supress that warning in almost all cells, but some of then remains that I dont understand where to place the hint.&lt;/P&gt;&lt;P&gt;This problematic cells are of the same type (first cell of the image attached).&lt;/P&gt;&lt;P&gt;In the second cell there is no warning (i am using the same temp table in 1º and 2º cell) and that would mean that there is no problem building that dataset. That being said, I think the problem is in the INSERT INTO ..... REPLACE WHERE.&lt;/P&gt;&lt;P&gt;Where can I find documentation about the range_joins with the INSERT INTO REPLACE WHERE operation? The one I referenced talks only about joins, and &lt;A href="https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-dml-insert-into" target="_self"&gt;here&lt;/A&gt;&amp;nbsp;(insert into replace where doc) there is no clue either.&lt;/P&gt;&lt;P&gt;Any idea to optimize this operation also?&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="range_joins.JPG" style="width: 999px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/17724i9A4D80198912E645/image-size/large?v=v2&amp;amp;px=999" role="button" title="range_joins.JPG" alt="range_joins.JPG" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;SPAN&gt;Thank you.&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 24 Jun 2025 09:02:40 GMT</pubDate>
    <dc:creator>Gustavo_Az</dc:creator>
    <dc:date>2025-06-24T09:02:40Z</dc:date>
    <item>
      <title>Doubt with range_join hints optimization, using INSERT INTO REPLACE WHERE</title>
      <link>https://community.databricks.com/t5/data-engineering/doubt-with-range-join-hints-optimization-using-insert-into/m-p/122636#M46833</link>
      <description>&lt;P&gt;Hello&lt;/P&gt;&lt;P&gt;Im optmizing a big notebook and have encountered many times the tip from databricks that says "&lt;SPAN&gt;Unused range join hints&lt;/SPAN&gt;". Reading the&amp;nbsp;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/optimizations/range-join" target="_self"&gt;documentation&lt;/A&gt; for reference, I have been able to supress that warning in almost all cells, but some of then remains that I dont understand where to place the hint.&lt;/P&gt;&lt;P&gt;This problematic cells are of the same type (first cell of the image attached).&lt;/P&gt;&lt;P&gt;In the second cell there is no warning (i am using the same temp table in 1º and 2º cell) and that would mean that there is no problem building that dataset. That being said, I think the problem is in the INSERT INTO ..... REPLACE WHERE.&lt;/P&gt;&lt;P&gt;Where can I find documentation about the range_joins with the INSERT INTO REPLACE WHERE operation? The one I referenced talks only about joins, and &lt;A href="https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-dml-insert-into" target="_self"&gt;here&lt;/A&gt;&amp;nbsp;(insert into replace where doc) there is no clue either.&lt;/P&gt;&lt;P&gt;Any idea to optimize this operation also?&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="range_joins.JPG" style="width: 999px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/17724i9A4D80198912E645/image-size/large?v=v2&amp;amp;px=999" role="button" title="range_joins.JPG" alt="range_joins.JPG" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;SPAN&gt;Thank you.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 24 Jun 2025 09:02:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/doubt-with-range-join-hints-optimization-using-insert-into/m-p/122636#M46833</guid>
      <dc:creator>Gustavo_Az</dc:creator>
      <dc:date>2025-06-24T09:02:40Z</dc:date>
    </item>
    <item>
      <title>Re: Doubt with range_join hints optimization, using INSERT INTO REPLACE WHERE</title>
      <link>https://community.databricks.com/t5/data-engineering/doubt-with-range-join-hints-optimization-using-insert-into/m-p/135476#M50357</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/74680"&gt;@Gustavo_Az&lt;/a&gt;&lt;/P&gt;&lt;P&gt;Try to use explain to understand what's happening.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="https://spark.apache.org/docs/latest/sql-ref-syntax-qry-explain.html" target="_blank"&gt;https://spark.apache.org/docs/latest/sql-ref-syntax-qry-explain.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 21 Oct 2025 06:03:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/doubt-with-range-join-hints-optimization-using-insert-into/m-p/135476#M50357</guid>
      <dc:creator>Prajapathy_NKR</dc:creator>
      <dc:date>2025-10-21T06:03:43Z</dc:date>
    </item>
    <item>
      <title>Re: Doubt with range_join hints optimization, using INSERT INTO REPLACE WHERE</title>
      <link>https://community.databricks.com/t5/data-engineering/doubt-with-range-join-hints-optimization-using-insert-into/m-p/136622#M50607</link>
      <description>&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;There is no official documentation covering the use of&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;range_join&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;hints directly with the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;INSERT INTO ... REPLACE WHERE&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;operation in Databricks—existing documentation around range joins focuses only on explicit joining operations, not on conditional insert statements like REPLACE WHERE. The warning about "Unused range join hints" typically appears when hints are present in queries where Databricks does not use them, especially outside SELECT JOIN clauses.​&lt;/P&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Range Join Hint Functionality&lt;/H2&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;The&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;range_join&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;hint is meant for optimizing joins that include range predicates (e.g., where one table’s values must fall within a range of another’s).​&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Hints must go on a&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;SELECT ... JOIN ...&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;statement or on Spark DataFrame&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;.join()&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;with the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;.hint()&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;method, not on INSERT statements or their predicates.​&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;When used within non-join statements (like INSERT INTO with REPLACE WHERE), Databricks will ignore such hints and may issue the unused hint warning.​&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;REPLACE WHERE Limitations&lt;/H2&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;The&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;REPLACE WHERE&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;clause for INSERT INTO applies only to Delta tables (Databricks Runtime 12.2 LTS+).&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;It atomically deletes and then inserts rows matching your predicate, lending precision and atomicity to data replacement.​&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;It’s unrelated to join internals, so join hints like&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;range_join&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;do not modify or optimize the REPLACE WHERE operation.​&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;To optimize this operation, focus on indexing, partitioning, and file compaction (using OPTIMIZE), rather than join hints.​&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;For batch efficiency, ensure your source query is well-partitioned and consider using MERGE INTO if doing upserts.​&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Optimization Tips for INSERT INTO REPLACE WHERE&lt;/H2&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Index columns used in the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;REPLACE WHERE&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;predicate for faster lookup and atomic replacement.​&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Use partitioning wisely—if you’re replacing a subset, partition by the predicate column(s) to avoid expensive full-table scans.​&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Regularly use the OPTIMIZE command on your Delta tables to compact small files.​&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Consider&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;MERGE INTO&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;for complex upserts instead of multiple REPLACE WHERE statements.​&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;In summary, range join hints are not used or valid with INSERT INTO ... REPLACE WHERE in Databricks, and any optimization must focus on Delta table management, indexing, and source query structure, not join hints.​&lt;/P&gt;</description>
      <pubDate>Wed, 29 Oct 2025 20:08:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/doubt-with-range-join-hints-optimization-using-insert-into/m-p/136622#M50607</guid>
      <dc:creator>mark_ott</dc:creator>
      <dc:date>2025-10-29T20:08:36Z</dc:date>
    </item>
  </channel>
</rss>

