cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Inconsistent Decimal Comparison Behavior Between SQL Warehouse (Photon) and Spark Clusters

maarko
New Contributor II

 

I'm seeing non-deterministic behavior when running the same query in SQL Warehouse (Photon) vs. interactive/job clusters (non-Photon), specifically involving a LEFT OUTER JOIN and a DECIMAL comparison in a WHERE clause.

 

I have two views:

  • View A: contains COL1 → DECIMAL(21,5)

  • View B: contains COL2 → DECIMAL(38,6)

I’m joining them on two keys and comparing the decimal values in a filter:

 

sql
 
SELECT ... FROM A LEFT OUTER JOIN B ON A.key1 = B.key1 AND A.key2 = B.key2 WHERE A.COL1 != B.COL2

 Unexpected Behavior

  • When I run this in SQL Warehouse (Photon): I get a consistent and correct result.

  • When I run the exact same query in a non-Photon interactive or job cluster:

    • I get inconsistent results across runs

    • The output row count changes on each run

    • When I narrow the filter to a specific key, e.g., EAN_CODE = '1234567890', the result is correct — but on the full dataset, results are wrong or unstable


 What I’ve Tried

  1. Casting both columns to the same type:

    sql
     
    WHERE CAST(A.COL1 AS DECIMAL(38,6)) != CAST(B.COL2 AS DECIMAL(38,6))

    → Still inconsistent

  2. Comparing as strings:

    sql
     
    WHERE CAST(A.COL1 AS STRING) != CAST(B.COL2 AS STRING)

    → Still inconsistent

  3. Checked intermediate output (before WHERE filter):

    • Joins return expected row count consistently

    • The issue starts only when COL1 != COL2 filter is applied

  4. Tried limiting to one EAN_CODE:

    • Correct behavior with expected results

    • Problem only occurs with full data volume

1 REPLY 1

lingareddy_Alva
Honored Contributor II

Hi @maarko 

 

This is a fascinating issue that points to several potential causes related to differences between
Photon and standard Spark execution engines, particularly around decimal handling and parallelism.

Root Causes
1. Decimal Precision and Scale Handling
The different decimal types (21,5) vs (38,6) can cause implicit conversions that behave differently between Photon and standard Spark. Even after explicit casting, the underlying comparison logic may differ.
2. Parallel Execution Non-Determinism
Non-Photon clusters use standard Spark's parallel execution, which can introduce non-deterministic behavior when:
- Hash-based operations encounter hash collisions
- Floating-point arithmetic operations are reordered
- Memory pressure causes different execution paths
3. NULL Handling in LEFT OUTER JOIN
With LEFT OUTER JOIN, B.COL2 can be NULL for unmatched rows.
The comparison A.COL1 != B.COL2 behaves differently when NULLs are involved, and this handling might differ between engines.

Recommended Solutions
Solution 1: Explicit NULL Handling
Solution 2: Use COALESCE for Consistent Comparison
Solution 3: Deterministic Ordering
Solution 4: Repartition for Consistency

 

 

LR

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now