Re: Result Difference Between View and Manually Ru... - Databricks Community

There are 2 fixes that I can think off

Option A: Make first_value deterministic

first_value(Customer_ID, true) OVER (
  PARTITION BY customer_name
  ORDER BY submitted ASC, event_id ASC
  ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)

Use a timestamp submitted, not date()
Add a stable tiebreaker (event_id, record id, etc.)
true → ignores NULLs
Explicit ROWS frame avoids Spark’s default RANGE behaviour with ties
Option B : Use row_number() instead
If you only need the “first” row deterministically:
```
row_number() OVER (
  PARTITION BY customer_name
  ORDER BY submitted_ts ASC, event_id ASC
)
```
Then select or propagate the value from rn = 1.