<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Result Difference Between View and Manually Run View Query in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/result-difference-between-view-and-manually-run-view-query/m-p/143359#M52156</link>
    <description>&lt;P&gt;What you’re seeing is very consistent with &lt;STRONG&gt;first_value being non-deterministic&lt;/STRONG&gt; when used as a window function unless your window ordering is &lt;EM&gt;fully deterministic&lt;/EM&gt; and you handle NULLs explicitly. Databricks’ own function docs call out that first_value is &lt;STRONG&gt;non-deterministic&lt;/STRONG&gt;.&amp;nbsp;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/functions/first_value" target="_blank" rel="noopener"&gt;https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/functions/first_value&lt;/A&gt;&lt;/P&gt;&lt;P&gt;In your case there are two common “gotchas” that explain “works in SQL editor / notebook, but sometimes NULL in a job”:&lt;/P&gt;&lt;H2&gt;1) Ties introduced by date(submitted)&lt;/H2&gt;&lt;P&gt;You’re ordering by date(submitted) (day precision). If a customer has &lt;STRONG&gt;multiple rows on the same day&lt;/STRONG&gt;, then there are ties in the ORDER BY. With ties, the “first row” within the day is not guaranteed unless you add a tiebreaker — different execution plans (interactive vs job) can pick a different physical row, and if the earliest physical row has Customer_ID = NULL, you’ll get NULL.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Fix:&lt;/STRONG&gt; order by a true timestamp + a unique tiebreaker (event id / ingestion id / primary key).&lt;/P&gt;&lt;H2&gt;2) NULL handling&lt;/H2&gt;&lt;P&gt;By default first_value(expr) &lt;STRONG&gt;respects nulls&lt;/STRONG&gt;, so if the first row (per ordering) has Customer_ID NULL, the result is NULL. Databricks supports an ignoreNull boolean argument (or IGNORE NULLS semantics in the docs) for first_value&lt;/P&gt;</description>
    <pubDate>Thu, 08 Jan 2026 14:25:12 GMT</pubDate>
    <dc:creator>bianca_unifeye</dc:creator>
    <dc:date>2026-01-08T14:25:12Z</dc:date>
    <item>
      <title>Result Difference Between View and Manually Run View Query</title>
      <link>https://community.databricks.com/t5/data-engineering/result-difference-between-view-and-manually-run-view-query/m-p/143346#M52149</link>
      <description>&lt;P&gt;I am experiencing an issue where a view does not display the correct results, but running the view query manually in either a new notebook or the SQL Editor displays different, correct results. I have tried switching the compute resource in the notebook to match the view creation job's SQL Warehouse and I have also restarted the SQL Warehouse with the same issue still existing. I also have dropped the view, created a new view with the same code and the results are still incorrect.&lt;/P&gt;</description>
      <pubDate>Thu, 08 Jan 2026 13:39:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/result-difference-between-view-and-manually-run-view-query/m-p/143346#M52149</guid>
      <dc:creator>ChrisRose</dc:creator>
      <dc:date>2026-01-08T13:39:52Z</dc:date>
    </item>
    <item>
      <title>Re: Result Difference Between View and Manually Run View Query</title>
      <link>https://community.databricks.com/t5/data-engineering/result-difference-between-view-and-manually-run-view-query/m-p/143349#M52150</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/204368"&gt;@ChrisRose&lt;/a&gt;&amp;nbsp; Can you please elaborate on the part when you say the view does not show correct results. Are you performing any computation or aggregation. Can you share the view query if possible.&lt;/P&gt;</description>
      <pubDate>Thu, 08 Jan 2026 14:01:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/result-difference-between-view-and-manually-run-view-query/m-p/143349#M52150</guid>
      <dc:creator>ckunal_eng</dc:creator>
      <dc:date>2026-01-08T14:01:20Z</dc:date>
    </item>
    <item>
      <title>Re: Result Difference Between View and Manually Run View Query</title>
      <link>https://community.databricks.com/t5/data-engineering/result-difference-between-view-and-manually-run-view-query/m-p/143350#M52151</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/204368"&gt;@ChrisRose&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;The only reason i could think of why this could happen is because of CACHE ,but in that case, restarting the warehouse should have solved the issue. Could you provide steps to reproduce this issue so that i can check internally&lt;/P&gt;</description>
      <pubDate>Thu, 08 Jan 2026 14:02:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/result-difference-between-view-and-manually-run-view-query/m-p/143350#M52151</guid>
      <dc:creator>K_Anudeep</dc:creator>
      <dc:date>2026-01-08T14:02:00Z</dc:date>
    </item>
    <item>
      <title>Re: Result Difference Between View and Manually Run View Query</title>
      <link>https://community.databricks.com/t5/data-engineering/result-difference-between-view-and-manually-run-view-query/m-p/143356#M52153</link>
      <description>&lt;P&gt;The issue is with a calculated field using the first_value aggregate function invoked as a window function with the OVER clause -&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;first_value&lt;/SPAN&gt;&lt;SPAN&gt;(Customer_ID) &lt;/SPAN&gt;&lt;SPAN&gt;OVER&lt;/SPAN&gt;&lt;SPAN&gt; (&lt;/SPAN&gt;&lt;SPAN&gt;PARTITION&lt;/SPAN&gt; &lt;SPAN&gt;BY&lt;/SPAN&gt;&lt;SPAN&gt; customer_name &lt;/SPAN&gt;&lt;SPAN&gt;ORDER BY&lt;/SPAN&gt; &lt;SPAN&gt;date&lt;/SPAN&gt;&lt;SPAN&gt;(submitted) &lt;/SPAN&gt;&lt;SPAN&gt;ASC&lt;/SPAN&gt;&lt;SPAN&gt;) &lt;/SPAN&gt;&lt;SPAN&gt;as&lt;/SPAN&gt;&lt;SPAN&gt; Customer_ID.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;The field calculates correctly in a notebook or SQL Editor, but when run in a job via notebook the value comes back null for a particular record. It does not happen for all records. Please let me know if that is enough information, or if you need more to go on.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Thu, 08 Jan 2026 14:13:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/result-difference-between-view-and-manually-run-view-query/m-p/143356#M52153</guid>
      <dc:creator>ChrisRose</dc:creator>
      <dc:date>2026-01-08T14:13:01Z</dc:date>
    </item>
    <item>
      <title>Re: Result Difference Between View and Manually Run View Query</title>
      <link>https://community.databricks.com/t5/data-engineering/result-difference-between-view-and-manually-run-view-query/m-p/143359#M52156</link>
      <description>&lt;P&gt;What you’re seeing is very consistent with &lt;STRONG&gt;first_value being non-deterministic&lt;/STRONG&gt; when used as a window function unless your window ordering is &lt;EM&gt;fully deterministic&lt;/EM&gt; and you handle NULLs explicitly. Databricks’ own function docs call out that first_value is &lt;STRONG&gt;non-deterministic&lt;/STRONG&gt;.&amp;nbsp;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/functions/first_value" target="_blank" rel="noopener"&gt;https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/functions/first_value&lt;/A&gt;&lt;/P&gt;&lt;P&gt;In your case there are two common “gotchas” that explain “works in SQL editor / notebook, but sometimes NULL in a job”:&lt;/P&gt;&lt;H2&gt;1) Ties introduced by date(submitted)&lt;/H2&gt;&lt;P&gt;You’re ordering by date(submitted) (day precision). If a customer has &lt;STRONG&gt;multiple rows on the same day&lt;/STRONG&gt;, then there are ties in the ORDER BY. With ties, the “first row” within the day is not guaranteed unless you add a tiebreaker — different execution plans (interactive vs job) can pick a different physical row, and if the earliest physical row has Customer_ID = NULL, you’ll get NULL.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Fix:&lt;/STRONG&gt; order by a true timestamp + a unique tiebreaker (event id / ingestion id / primary key).&lt;/P&gt;&lt;H2&gt;2) NULL handling&lt;/H2&gt;&lt;P&gt;By default first_value(expr) &lt;STRONG&gt;respects nulls&lt;/STRONG&gt;, so if the first row (per ordering) has Customer_ID NULL, the result is NULL. Databricks supports an ignoreNull boolean argument (or IGNORE NULLS semantics in the docs) for first_value&lt;/P&gt;</description>
      <pubDate>Thu, 08 Jan 2026 14:25:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/result-difference-between-view-and-manually-run-view-query/m-p/143359#M52156</guid>
      <dc:creator>bianca_unifeye</dc:creator>
      <dc:date>2026-01-08T14:25:12Z</dc:date>
    </item>
    <item>
      <title>Re: Result Difference Between View and Manually Run View Query</title>
      <link>https://community.databricks.com/t5/data-engineering/result-difference-between-view-and-manually-run-view-query/m-p/143362#M52158</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/204368"&gt;@ChrisRose&lt;/a&gt;&amp;nbsp; to add to whatever has been said,&amp;nbsp;nondeterministic ordering seems to be reason. You can check using this query :&amp;nbsp;&lt;/P&gt;&lt;P&gt;SELECT&lt;BR /&gt;customer_name,&lt;BR /&gt;DATE(submitted),&lt;BR /&gt;COUNT(*)&lt;BR /&gt;FROM your_view&lt;BR /&gt;GROUP BY customer_name, DATE(submitted)&lt;BR /&gt;HAVING COUNT(*) &amp;gt; 1&lt;/P&gt;&lt;P&gt;If the &lt;STRONG&gt;earliest submitted date&lt;/STRONG&gt; for that customer_name has customer_id = NULL, then&amp;nbsp;&lt;SPAN&gt;SQL editor/notebook may appear correct, hwoever since job runs show full data if the first row is NULL, result is also NULL.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;You can confirm using this query:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;SELECT&lt;BR /&gt;customer_name,&lt;BR /&gt;submitted,&lt;BR /&gt;Customer_ID&lt;BR /&gt;FROM your_view&lt;BR /&gt;WHERE customer_name = '&amp;lt;problem_customer&amp;gt;'&lt;BR /&gt;ORDER BY DATE(submitted) ASC&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Additionally to resolve this you can add a tie breaker to the query as given below:&lt;/P&gt;&lt;P&gt;FIRST_VALUE(Customer_ID IGNORE NULLS)&lt;BR /&gt;OVER (&lt;BR /&gt;PARTITION BY customer_name&lt;BR /&gt;ORDER BY DATE(submitted) ASC, submitted ASC, record_id ASC&lt;BR /&gt;) AS Customer_ID&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 08 Jan 2026 14:30:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/result-difference-between-view-and-manually-run-view-query/m-p/143362#M52158</guid>
      <dc:creator>ckunal_eng</dc:creator>
      <dc:date>2026-01-08T14:30:32Z</dc:date>
    </item>
    <item>
      <title>Re: Result Difference Between View and Manually Run View Query</title>
      <link>https://community.databricks.com/t5/data-engineering/result-difference-between-view-and-manually-run-view-query/m-p/143363#M52159</link>
      <description>&lt;P&gt;There are 2 fixes that I can think off&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;H4&gt;&lt;STRONG&gt;Option A:&amp;nbsp; Make first_value deterministic&lt;/STRONG&gt;&lt;/H4&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;PRE&gt;&lt;SPAN&gt;&lt;SPAN class=""&gt;first_value(Customer_ID, &lt;SPAN class=""&gt;true) &lt;SPAN class=""&gt;OVER (
  &lt;SPAN class=""&gt;PARTITION &lt;SPAN class=""&gt;BY customer_name
  &lt;SPAN class=""&gt;ORDER &lt;SPAN class=""&gt;BY submitted &lt;SPAN class=""&gt;ASC, event_id &lt;SPAN class=""&gt;ASC
  &lt;SPAN class=""&gt;ROWS &lt;SPAN class=""&gt;BETWEEN UNBOUNDED PRECEDING &lt;SPAN class=""&gt;AND UNBOUNDED FOLLOWING
)&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Use a &lt;STRONG&gt;timestamp submitted, not date()&lt;/STRONG&gt;&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Add a &lt;STRONG&gt;stable tiebreaker (event_id, record id, etc.)&lt;/STRONG&gt;&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;true → ignores NULLs&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Explicit ROWS frame avoids Spark’s default RANGE behaviour with ties&lt;/P&gt;&lt;H4&gt;&lt;STRONG&gt;Option B : Use row_number() instead&lt;/STRONG&gt;&lt;/H4&gt;&lt;P&gt;If you only need the “first” row deterministically:&lt;/P&gt;&lt;PRE&gt;&lt;SPAN&gt;&lt;SPAN class=""&gt;row_number() &lt;SPAN class=""&gt;OVER (
  &lt;SPAN class=""&gt;PARTITION &lt;SPAN class=""&gt;BY customer_name
  &lt;SPAN class=""&gt;ORDER &lt;SPAN class=""&gt;BY submitted_ts &lt;SPAN class=""&gt;ASC, event_id &lt;SPAN class=""&gt;ASC
)&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/PRE&gt;&lt;P&gt;Then select or propagate the value from rn = 1.&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Thu, 08 Jan 2026 14:34:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/result-difference-between-view-and-manually-run-view-query/m-p/143363#M52159</guid>
      <dc:creator>bianca_unifeye</dc:creator>
      <dc:date>2026-01-08T14:34:51Z</dc:date>
    </item>
  </channel>
</rss>

