- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-08-2026 06:25 AM
What you’re seeing is very consistent with first_value being non-deterministic when used as a window function unless your window ordering is fully deterministic and you handle NULLs explicitly. Databricks’ own function docs call out that first_value is non-deterministic. https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/functions/first_value
In your case there are two common “gotchas” that explain “works in SQL editor / notebook, but sometimes NULL in a job”:
1) Ties introduced by date(submitted)
You’re ordering by date(submitted) (day precision). If a customer has multiple rows on the same day, then there are ties in the ORDER BY. With ties, the “first row” within the day is not guaranteed unless you add a tiebreaker — different execution plans (interactive vs job) can pick a different physical row, and if the earliest physical row has Customer_ID = NULL, you’ll get NULL.
Fix: order by a true timestamp + a unique tiebreaker (event id / ingestion id / primary key).
2) NULL handling
By default first_value(expr) respects nulls, so if the first row (per ordering) has Customer_ID NULL, the result is NULL. Databricks supports an ignoreNull boolean argument (or IGNORE NULLS semantics in the docs) for first_value