topic Re: Result Difference Between View and Manually Run View Query in Data Engineering

Result Difference Between View and Manually Run View Query

ChrisRose — Thu, 08 Jan 2026 13:39:52 GMT

I am experiencing an issue where a view does not display the correct results, but running the view query manually in either a new notebook or the SQL Editor displays different, correct results. I have tried switching the compute resource in the notebook to match the view creation job's SQL Warehouse and I have also restarted the SQL Warehouse with the same issue still existing. I also have dropped the view, created a new view with the same code and the results are still incorrect.

Re: Result Difference Between View and Manually Run View Query

ckunal_eng — Thu, 08 Jan 2026 14:01:20 GMT

@ChrisRose Can you please elaborate on the part when you say the view does not show correct results. Are you performing any computation or aggregation. Can you share the view query if possible.

Re: Result Difference Between View and Manually Run View Query

K_Anudeep — Thu, 08 Jan 2026 14:02:00 GMT

Hello @ChrisRose ,

The only reason i could think of why this could happen is because of CACHE ,but in that case, restarting the warehouse should have solved the issue. Could you provide steps to reproduce this issue so that i can check internally

Re: Result Difference Between View and Manually Run View Query

ChrisRose — Thu, 08 Jan 2026 14:13:01 GMT

The issue is with a calculated field using the first_value aggregate function invoked as a window function with the OVER clause -

first_value(Customer_ID) OVER (PARTITION BY customer_name ORDER BY date(submitted) ASC) as Customer_ID.

The field calculates correctly in a notebook or SQL Editor, but when run in a job via notebook the value comes back null for a particular record. It does not happen for all records. Please let me know if that is enough information, or if you need more to go on.

Re: Result Difference Between View and Manually Run View Query

bianca_unifeye — Thu, 08 Jan 2026 14:25:12 GMT

What you’re seeing is very consistent with first_value being non-deterministic when used as a window function unless your window ordering is fully deterministic and you handle NULLs explicitly. Databricks’ own function docs call out that first_value is non-deterministic. https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/functions/first_value

In your case there are two common “gotchas” that explain “works in SQL editor / notebook, but sometimes NULL in a job”:

1) Ties introduced by date(submitted)

You’re ordering by date(submitted) (day precision). If a customer has multiple rows on the same day, then there are ties in the ORDER BY. With ties, the “first row” within the day is not guaranteed unless you add a tiebreaker — different execution plans (interactive vs job) can pick a different physical row, and if the earliest physical row has Customer_ID = NULL, you’ll get NULL.

Fix: order by a true timestamp + a unique tiebreaker (event id / ingestion id / primary key).

2) NULL handling

By default first_value(expr) respects nulls, so if the first row (per ordering) has Customer_ID NULL, the result is NULL. Databricks supports an ignoreNull boolean argument (or IGNORE NULLS semantics in the docs) for first_value

Re: Result Difference Between View and Manually Run View Query

ckunal_eng — Thu, 08 Jan 2026 14:30:32 GMT

@ChrisRose to add to whatever has been said, nondeterministic ordering seems to be reason. You can check using this query :

SELECT
customer_name,
DATE(submitted),
COUNT(*)
FROM your_view
GROUP BY customer_name, DATE(submitted)
HAVING COUNT(*) > 1

If the earliest submitted date for that customer_name has customer_id = NULL, then SQL editor/notebook may appear correct, hwoever since job runs show full data if the first row is NULL, result is also NULL.

You can confirm using this query:

SELECT
customer_name,
submitted,
Customer_ID
FROM your_view
WHERE customer_name = '<problem_customer>'
ORDER BY DATE(submitted) ASC

Additionally to resolve this you can add a tie breaker to the query as given below:

FIRST_VALUE(Customer_ID IGNORE NULLS)
OVER (
PARTITION BY customer_name
ORDER BY DATE(submitted) ASC, submitted ASC, record_id ASC
) AS Customer_ID

Re: Result Difference Between View and Manually Run View Query

bianca_unifeye — Thu, 08 Jan 2026 14:34:51 GMT

There are 2 fixes that I can think off

Option A: Make first_value deterministic

first_value(Customer_ID, true) OVER (
  PARTITION BY customer_name
  ORDER BY submitted ASC, event_id ASC
  ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)

Use a timestamp submitted, not date()
Add a stable tiebreaker (event_id, record id, etc.)
true → ignores NULLs
Explicit ROWS frame avoids Spark’s default RANGE behaviour with ties
Option B : Use row_number() instead
If you only need the “first” row deterministically:
```
row_number() OVER (
  PARTITION BY customer_name
  ORDER BY submitted_ts ASC, event_id ASC
)
```
Then select or propagate the value from rn = 1.

topic Re: Result Difference Between View and Manually Run View Query in Data Engineering

Result Difference Between View and Manually Run View Query

Re: Result Difference Between View and Manually Run View Query

Re: Result Difference Between View and Manually Run View Query

Re: Result Difference Between View and Manually Run View Query

Re: Result Difference Between View and Manually Run View Query

1) Ties introduced by date(submitted)

2) NULL handling

Re: Result Difference Between View and Manually Run View Query

Re: Result Difference Between View and Manually Run View Query

Option A: Make first_value deterministic

Option B : Use row_number() instead