<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Inconsistent Decimal Comparison Behavior Between SQL Warehouse (Photon) and Spark Clusters in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/inconsistent-decimal-comparison-behavior-between-sql-warehouse/m-p/122378#M46756</link>
    <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm seeing non-deterministic behavior when running the same query in SQL Warehouse (Photon) vs. interactive/job clusters (non-Photon), specifically involving a LEFT OUTER JOIN and a DECIMAL comparison in a WHERE clause.&lt;/P&gt;&lt;H3&gt;&amp;nbsp;&lt;/H3&gt;&lt;P&gt;I have two views:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;View A: contains COL1 → DECIMAL(21,5)&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;View B: contains COL2 → DECIMAL(38,6)&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;I’m joining them on two keys and comparing the decimal values in a filter:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;sql&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;&lt;SPAN class=""&gt;SELECT&lt;/SPAN&gt; ... &lt;SPAN class=""&gt;FROM&lt;/SPAN&gt; A &lt;SPAN class=""&gt;LEFT&lt;/SPAN&gt; &lt;SPAN class=""&gt;OUTER&lt;/SPAN&gt; &lt;SPAN class=""&gt;JOIN&lt;/SPAN&gt; B &lt;SPAN class=""&gt;ON&lt;/SPAN&gt; A.key1 &lt;SPAN class=""&gt;=&lt;/SPAN&gt; B.key1 &lt;SPAN class=""&gt;AND&lt;/SPAN&gt; A.key2 &lt;SPAN class=""&gt;=&lt;/SPAN&gt; B.key2 &lt;SPAN class=""&gt;WHERE&lt;/SPAN&gt; A.COL1 &lt;SPAN class=""&gt;!=&lt;/SPAN&gt; B.COL2&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;HR /&gt;&lt;H3&gt;&amp;nbsp;Unexpected Behavior&lt;/H3&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;When I run this in &lt;STRONG&gt;SQL Warehouse (Photon)&lt;/STRONG&gt;: I get a consistent and correct result.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;When I run the exact same query in a &lt;STRONG&gt;non-Photon interactive or job cluster&lt;/STRONG&gt;:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;I get &lt;STRONG&gt;inconsistent results across runs&lt;/STRONG&gt;&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;The output row count changes on each run&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;When I &lt;STRONG&gt;narrow the filter to a specific key&lt;/STRONG&gt;, e.g., EAN_CODE = '1234567890', the result is correct — but on the full dataset, results are wrong or unstable&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;HR /&gt;&lt;H3&gt;&amp;nbsp;What I’ve Tried&lt;/H3&gt;&lt;OL&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Casting both columns to the same type&lt;/STRONG&gt;:&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;sql&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;&lt;SPAN class=""&gt;WHERE&lt;/SPAN&gt; &lt;SPAN class=""&gt;CAST&lt;/SPAN&gt;(A.COL1 &lt;SPAN class=""&gt;AS&lt;/SPAN&gt; &lt;SPAN class=""&gt;DECIMAL&lt;/SPAN&gt;(&lt;SPAN class=""&gt;38&lt;/SPAN&gt;,&lt;SPAN class=""&gt;6&lt;/SPAN&gt;)) &lt;SPAN class=""&gt;!=&lt;/SPAN&gt; &lt;SPAN class=""&gt;CAST&lt;/SPAN&gt;(B.COL2 &lt;SPAN class=""&gt;AS&lt;/SPAN&gt; &lt;SPAN class=""&gt;DECIMAL&lt;/SPAN&gt;(&lt;SPAN class=""&gt;38&lt;/SPAN&gt;,&lt;SPAN class=""&gt;6&lt;/SPAN&gt;)) &lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;→ Still inconsistent&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Comparing as strings&lt;/STRONG&gt;:&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;sql&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;&lt;SPAN class=""&gt;WHERE&lt;/SPAN&gt; &lt;SPAN class=""&gt;CAST&lt;/SPAN&gt;(A.COL1 &lt;SPAN class=""&gt;AS&lt;/SPAN&gt; STRING) &lt;SPAN class=""&gt;!=&lt;/SPAN&gt; &lt;SPAN class=""&gt;CAST&lt;/SPAN&gt;(B.COL2 &lt;SPAN class=""&gt;AS&lt;/SPAN&gt; STRING) &lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;→ Still inconsistent&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Checked intermediate output (before WHERE filter)&lt;/STRONG&gt;:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Joins return expected row count consistently&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;The issue starts only &lt;STRONG&gt;when COL1 != COL2 filter is applied&lt;/STRONG&gt;&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Tried limiting to one EAN_CODE&lt;/STRONG&gt;:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Correct behavior with expected results&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Problem only occurs with full data volume&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/OL&gt;</description>
    <pubDate>Fri, 20 Jun 2025 16:55:52 GMT</pubDate>
    <dc:creator>maarko</dc:creator>
    <dc:date>2025-06-20T16:55:52Z</dc:date>
    <item>
      <title>Inconsistent Decimal Comparison Behavior Between SQL Warehouse (Photon) and Spark Clusters</title>
      <link>https://community.databricks.com/t5/data-engineering/inconsistent-decimal-comparison-behavior-between-sql-warehouse/m-p/122378#M46756</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm seeing non-deterministic behavior when running the same query in SQL Warehouse (Photon) vs. interactive/job clusters (non-Photon), specifically involving a LEFT OUTER JOIN and a DECIMAL comparison in a WHERE clause.&lt;/P&gt;&lt;H3&gt;&amp;nbsp;&lt;/H3&gt;&lt;P&gt;I have two views:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;View A: contains COL1 → DECIMAL(21,5)&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;View B: contains COL2 → DECIMAL(38,6)&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;I’m joining them on two keys and comparing the decimal values in a filter:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;sql&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;&lt;SPAN class=""&gt;SELECT&lt;/SPAN&gt; ... &lt;SPAN class=""&gt;FROM&lt;/SPAN&gt; A &lt;SPAN class=""&gt;LEFT&lt;/SPAN&gt; &lt;SPAN class=""&gt;OUTER&lt;/SPAN&gt; &lt;SPAN class=""&gt;JOIN&lt;/SPAN&gt; B &lt;SPAN class=""&gt;ON&lt;/SPAN&gt; A.key1 &lt;SPAN class=""&gt;=&lt;/SPAN&gt; B.key1 &lt;SPAN class=""&gt;AND&lt;/SPAN&gt; A.key2 &lt;SPAN class=""&gt;=&lt;/SPAN&gt; B.key2 &lt;SPAN class=""&gt;WHERE&lt;/SPAN&gt; A.COL1 &lt;SPAN class=""&gt;!=&lt;/SPAN&gt; B.COL2&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;HR /&gt;&lt;H3&gt;&amp;nbsp;Unexpected Behavior&lt;/H3&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;When I run this in &lt;STRONG&gt;SQL Warehouse (Photon)&lt;/STRONG&gt;: I get a consistent and correct result.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;When I run the exact same query in a &lt;STRONG&gt;non-Photon interactive or job cluster&lt;/STRONG&gt;:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;I get &lt;STRONG&gt;inconsistent results across runs&lt;/STRONG&gt;&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;The output row count changes on each run&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;When I &lt;STRONG&gt;narrow the filter to a specific key&lt;/STRONG&gt;, e.g., EAN_CODE = '1234567890', the result is correct — but on the full dataset, results are wrong or unstable&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;HR /&gt;&lt;H3&gt;&amp;nbsp;What I’ve Tried&lt;/H3&gt;&lt;OL&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Casting both columns to the same type&lt;/STRONG&gt;:&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;sql&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;&lt;SPAN class=""&gt;WHERE&lt;/SPAN&gt; &lt;SPAN class=""&gt;CAST&lt;/SPAN&gt;(A.COL1 &lt;SPAN class=""&gt;AS&lt;/SPAN&gt; &lt;SPAN class=""&gt;DECIMAL&lt;/SPAN&gt;(&lt;SPAN class=""&gt;38&lt;/SPAN&gt;,&lt;SPAN class=""&gt;6&lt;/SPAN&gt;)) &lt;SPAN class=""&gt;!=&lt;/SPAN&gt; &lt;SPAN class=""&gt;CAST&lt;/SPAN&gt;(B.COL2 &lt;SPAN class=""&gt;AS&lt;/SPAN&gt; &lt;SPAN class=""&gt;DECIMAL&lt;/SPAN&gt;(&lt;SPAN class=""&gt;38&lt;/SPAN&gt;,&lt;SPAN class=""&gt;6&lt;/SPAN&gt;)) &lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;→ Still inconsistent&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Comparing as strings&lt;/STRONG&gt;:&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;sql&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;&lt;SPAN class=""&gt;WHERE&lt;/SPAN&gt; &lt;SPAN class=""&gt;CAST&lt;/SPAN&gt;(A.COL1 &lt;SPAN class=""&gt;AS&lt;/SPAN&gt; STRING) &lt;SPAN class=""&gt;!=&lt;/SPAN&gt; &lt;SPAN class=""&gt;CAST&lt;/SPAN&gt;(B.COL2 &lt;SPAN class=""&gt;AS&lt;/SPAN&gt; STRING) &lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;→ Still inconsistent&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Checked intermediate output (before WHERE filter)&lt;/STRONG&gt;:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Joins return expected row count consistently&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;The issue starts only &lt;STRONG&gt;when COL1 != COL2 filter is applied&lt;/STRONG&gt;&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Tried limiting to one EAN_CODE&lt;/STRONG&gt;:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Correct behavior with expected results&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Problem only occurs with full data volume&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/OL&gt;</description>
      <pubDate>Fri, 20 Jun 2025 16:55:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/inconsistent-decimal-comparison-behavior-between-sql-warehouse/m-p/122378#M46756</guid>
      <dc:creator>maarko</dc:creator>
      <dc:date>2025-06-20T16:55:52Z</dc:date>
    </item>
    <item>
      <title>Re: Inconsistent Decimal Comparison Behavior Between SQL Warehouse (Photon) and Spark Clusters</title>
      <link>https://community.databricks.com/t5/data-engineering/inconsistent-decimal-comparison-behavior-between-sql-warehouse/m-p/122467#M46783</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/171023"&gt;@maarko&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This is a fascinating issue that points to several potential causes related to differences between&lt;BR /&gt;Photon and standard Spark execution engines, particularly around decimal handling and parallelism.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Root Causes&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;1. Decimal Precision and Scale Handling&lt;/STRONG&gt;&lt;BR /&gt;The different decimal types (21,5) vs (38,6) can cause implicit conversions that behave differently between Photon and standard Spark. Even after explicit casting, the underlying comparison logic may differ.&lt;BR /&gt;&lt;STRONG&gt;2. Parallel Execution Non-Determinism&lt;/STRONG&gt;&lt;BR /&gt;Non-Photon clusters use standard Spark's parallel execution, which can introduce non-deterministic behavior when:&lt;BR /&gt;- Hash-based operations encounter hash collisions&lt;BR /&gt;- Floating-point arithmetic operations are reordered&lt;BR /&gt;- Memory pressure causes different execution paths&lt;BR /&gt;&lt;STRONG&gt;3. NULL Handling in LEFT OUTER JOIN&lt;/STRONG&gt;&lt;BR /&gt;With LEFT OUTER JOIN, B.COL2 can be NULL for unmatched rows.&lt;BR /&gt;The comparison A.COL1 != B.COL2 behaves differently when NULLs are involved, and this handling might differ between engines.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Recommended Solutions&lt;/STRONG&gt;&lt;BR /&gt;Solution 1: Explicit NULL Handling&lt;BR /&gt;Solution 2: Use COALESCE for Consistent Comparison&lt;BR /&gt;Solution 3: Deterministic Ordering&lt;BR /&gt;Solution 4: Repartition for Consistency&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 22 Jun 2025 16:39:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/inconsistent-decimal-comparison-behavior-between-sql-warehouse/m-p/122467#M46783</guid>
      <dc:creator>lingareddy_Alva</dc:creator>
      <dc:date>2025-06-22T16:39:51Z</dc:date>
    </item>
  </channel>
</rss>

