<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Databricks SQL and Engineer Notebooks yields different outputs from same script in Warehousing &amp; Analytics</title>
    <link>https://community.databricks.com/t5/warehousing-analytics/databricks-sql-and-engineer-notebooks-yields-different-outputs/m-p/43036#M887</link>
    <description>&lt;P&gt;UPDATE:&amp;nbsp;&lt;/P&gt;&lt;P&gt;It seems to be something with the use of LAST function in Databricks SQL that is behind the different outputs.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;final as (select 

Skolekontrakt,
SchoolYear,
sale_id,
Dato,
last(contact_id) contact_id ,
last(SisteVerdi) SisteVerdi ,
last(Status) Status ,
last(SistOppdatert) SistOppdatert ,
Webbestilling01,
Medarbeider,
SalgsenhetKat,
SalgskortOpprettet,
SalgsDato,
Avbruddsdato,
ZipCode

from HovedTABELL

group by
Skolekontrakt,
SchoolYear,
sale_id,
Webbestilling01,
Medarbeider,
SalgsenhetKat,
SalgskortOpprettet,
SalgsDato,
Avbruddsdato,
Dato,
ZipCode)

select
  H.*,
  year(H.SalgsDato) SignYear,
  weekofyear(date(H.SalgsDato)) SignWeek,
  weekofyear(H.Dato)
from
  final H
where
  H.Status in (10,11)
  and year(H.SalgsDato) = year(H.Dato) --and weekofyear(H.SalgsDato)=weekofyear(Dato)-1
  --VED STATUS I SALGSPERIODER--
  and int(weekofyear(date(H.SalgsDato))) &amp;lt; int(weekofyear(H.Dato)) &lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So, removing the groupby statement yields the same results in notebook and Databricks SQL. Also, using fex count and groupby produces the same output. But using LAST produces different outputs.&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;are you aware of the difference of using LAST in notebook vs SQL? Anyone else?&lt;/P&gt;</description>
    <pubDate>Fri, 01 Sep 2023 07:03:05 GMT</pubDate>
    <dc:creator>mortenhaga</dc:creator>
    <dc:date>2023-09-01T07:03:05Z</dc:date>
    <item>
      <title>Databricks SQL and Engineer Notebooks yields different outputs from same script</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/databricks-sql-and-engineer-notebooks-yields-different-outputs/m-p/42943#M882</link>
      <description>&lt;P&gt;Hi all&lt;/P&gt;&lt;P&gt;We are having some alarming issues regarding a script that yields different output when running on SQL vs Notebook. The correct output should be 8625 rows which it is in the notebook, but the output in Databricks SQL is 156 rows. The script uses widgets in both Notebook and SQL but we have tried with hardcored values as well.&amp;nbsp;&lt;/P&gt;&lt;P&gt;The script is too large to paste in here, so please get in touch with me to to obtain it.&lt;/P&gt;&lt;P&gt;We have no idea why this is happening. Can anyone please help?&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Serverelss cluster info"&gt;&lt;img src="https://community.databricks.com/skins/images/2E818018961E7F2BE1E8ADBF673C3CC8/responsive_peak/images/image_unmoderated.gif" alt="Serverelss cluster info" /&gt;&lt;/span&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Notebook cluster info"&gt;&lt;img src="https://community.databricks.com/skins/images/2E818018961E7F2BE1E8ADBF673C3CC8/responsive_peak/images/image_unmoderated.gif" alt="Notebook cluster info" /&gt;&lt;/span&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Wrong output in notebook attaching servereless cluster"&gt;&lt;img src="https://community.databricks.com/skins/images/2E818018961E7F2BE1E8ADBF673C3CC8/responsive_peak/images/image_unmoderated.gif" alt="Wrong output in notebook attaching servereless cluster" /&gt;&lt;/span&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Wrong output in SQL serverless"&gt;&lt;img src="https://community.databricks.com/skins/images/2E818018961E7F2BE1E8ADBF673C3CC8/responsive_peak/images/image_unmoderated.gif" alt="Wrong output in SQL serverless" /&gt;&lt;/span&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Correct output in notebook with notebook compute/cluster"&gt;&lt;img src="https://community.databricks.com/skins/images/2E818018961E7F2BE1E8ADBF673C3CC8/responsive_peak/images/image_unmoderated.gif" alt="Correct output in notebook with notebook compute/cluster" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 31 Aug 2023 08:42:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/databricks-sql-and-engineer-notebooks-yields-different-outputs/m-p/42943#M882</guid>
      <dc:creator>mortenhaga</dc:creator>
      <dc:date>2023-08-31T08:42:36Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks SQL and Engineer Notebooks yields different outputs from same script</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/databricks-sql-and-engineer-notebooks-yields-different-outputs/m-p/42970#M885</link>
      <description>&lt;P&gt;Hi Kaniz&lt;/P&gt;&lt;P&gt;Thanks for taking your time to reply.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Widgets: widgets is not a part of the problem as we have tested hardcoded values (strings and dates) that replace the need for widgets. Still no luck.&lt;/P&gt;&lt;P&gt;Cluster: Regarding your tip on cluster configurations, there is very limited configurations on a serverless cluster in comparison to a notebook cluster, so Im not sure about how we can align the clusters at all.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Dependencies/Libraries: We use only built-in libraries and dependencies in the notebook cluster and again, since the serverless cluster have limited configurations, this does not seem to be the issue.&lt;/P&gt;&lt;P&gt;Clearing output: We have tried that.&lt;/P&gt;&lt;P&gt;Where can I get in touch with a databricks expert?&lt;/P&gt;</description>
      <pubDate>Thu, 31 Aug 2023 11:27:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/databricks-sql-and-engineer-notebooks-yields-different-outputs/m-p/42970#M885</guid>
      <dc:creator>mortenhaga</dc:creator>
      <dc:date>2023-08-31T11:27:22Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks SQL and Engineer Notebooks yields different outputs from same script</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/databricks-sql-and-engineer-notebooks-yields-different-outputs/m-p/43036#M887</link>
      <description>&lt;P&gt;UPDATE:&amp;nbsp;&lt;/P&gt;&lt;P&gt;It seems to be something with the use of LAST function in Databricks SQL that is behind the different outputs.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;final as (select 

Skolekontrakt,
SchoolYear,
sale_id,
Dato,
last(contact_id) contact_id ,
last(SisteVerdi) SisteVerdi ,
last(Status) Status ,
last(SistOppdatert) SistOppdatert ,
Webbestilling01,
Medarbeider,
SalgsenhetKat,
SalgskortOpprettet,
SalgsDato,
Avbruddsdato,
ZipCode

from HovedTABELL

group by
Skolekontrakt,
SchoolYear,
sale_id,
Webbestilling01,
Medarbeider,
SalgsenhetKat,
SalgskortOpprettet,
SalgsDato,
Avbruddsdato,
Dato,
ZipCode)

select
  H.*,
  year(H.SalgsDato) SignYear,
  weekofyear(date(H.SalgsDato)) SignWeek,
  weekofyear(H.Dato)
from
  final H
where
  H.Status in (10,11)
  and year(H.SalgsDato) = year(H.Dato) --and weekofyear(H.SalgsDato)=weekofyear(Dato)-1
  --VED STATUS I SALGSPERIODER--
  and int(weekofyear(date(H.SalgsDato))) &amp;lt; int(weekofyear(H.Dato)) &lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So, removing the groupby statement yields the same results in notebook and Databricks SQL. Also, using fex count and groupby produces the same output. But using LAST produces different outputs.&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;are you aware of the difference of using LAST in notebook vs SQL? Anyone else?&lt;/P&gt;</description>
      <pubDate>Fri, 01 Sep 2023 07:03:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/databricks-sql-and-engineer-notebooks-yields-different-outputs/m-p/43036#M887</guid>
      <dc:creator>mortenhaga</dc:creator>
      <dc:date>2023-09-01T07:03:05Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks SQL and Engineer Notebooks yields different outputs from same script</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/databricks-sql-and-engineer-notebooks-yields-different-outputs/m-p/43630#M903</link>
      <description>&lt;P&gt;UPDATE:&lt;BR /&gt;&lt;SPAN&gt;I think we have identefied and solved the issue. It seems like using LAST with Databricks SQL requires to excplicitly be careful about setting the "ignoreNull" argument and also be careful about the correct datatype.&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;I guess this is because of Databricks SQL using ANSI. Using LAST in notebook, all of this is taking care of by spark under the hood.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;With this in mind, we might just use notebooks instead of Databricks SQL solely as our got-to query UI, even for simple queries, so that we are 100% sure that we get correct results.&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 05 Sep 2023 12:21:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/databricks-sql-and-engineer-notebooks-yields-different-outputs/m-p/43630#M903</guid>
      <dc:creator>mortenhaga</dc:creator>
      <dc:date>2023-09-05T12:21:25Z</dc:date>
    </item>
  </channel>
</rss>

