cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Window function using last/last_value with PARTITION BY/ORDER BY has unexpected results

alexisjohnson
New Contributor III

Hi, I'm wondering if this is the expected behavior when using last or last_value in a window function? I've written a query like this:

select
    col1,
    col2,
    last_value(col2) over (partition by col1 order by col2) as column2_last
from values
    (1, 10), (1, 11), (1, 12),
    (2, 20), (2, 21), (2, 22);

 In Snowflake I get the following results. The behavior feels right, since we're partitioning by col1 and ordering by col2 (https://docs.snowflake.com/en/sql-reference/functions/last_value.html).

Screen Shot 2021-11-18 at 12.48.25 PM 

In Databricks, I get:

Screen Shot 2021-11-18 at 12.48.32 PMIt seems like the ORDER BY doesn't get applied... Perhaps I am writing the query incorrectly?

Any insight into this behavior would be very helpful. Also, I'm curious why last and last_value are synonyms? https://docs.databricks.com/sql/language-manual/functions/last.html

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions

Anonymous
Not applicable

@Alexis Johnson​ - Here is where you can open a support ticket:

Contact Us

View solution in original post

7 REPLIES 7

Kaniz
Community Manager
Community Manager

Hi @ alexisjohnson! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else I will get back to you soon. Thanks.

Hubert-Dudek
Esteemed Contributor III

yes indeed it looks like a bug with LAST_VALUE,

Min gives correct results:

image.png 

Max hmm only when desc is specified (shouldn't affect as max is max but only with descending sorting it works as should be):

image.png 

for last_value is exactly like on your example (so nothing helps - I tried many ways).

@Kaniz Fatma​ maybe someone from inside databricks could look at it

Agreed. Interesting find that `max` behaves funny when `desc` isn't specified.

I'd also note that `first_value` yields the correct result.

@Kaniz Fatma​ Is there a process for filing bugs in Databricks? It'd be great to get this fixed.

Hi @Alexis Johnson​ , You can open a support ticket with Databricks in order to fix this.

Anonymous
Not applicable

@Alexis Johnson​ - Here is where you can open a support ticket:

Contact Us

Thank you for your help, @Hubert Dudek​ !

Carv
Visitor II

For those stumbling across this; it seems LAST_VALUE emulates the same functionality as it does in SQL Server which does not, in most people's minds, have a proper row/range frame for the window. You can adjust it with the below syntax.

I understand last_value emulating what is in my mind a mistake from sql_server but I don't understand why last() (an alias for last_value) has to use the same window row range.

Carv_0-1689191879502.png

SELECT
    col1
  , col2
  , LAST_VALUE(col2) OVER (PARTITION BY col1 ORDER BY col2 ASC) as column2_last
  , LAST_VALUE(col2) OVER (PARTITION BY col1 ORDER BY col2 ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as column2_last_unbound
  , FIRST_VALUE(col2) OVER (PARTITION BY col1 ORDER BY col2 ASC) as column2_first
  , FIRST_VALUE(col2) OVER (PARTITION BY col1 ORDER BY col2 DESC) as column2_first_reversed
FROM values
  (1,10), (1,12), (1,11),
  (2,20), (2,21), (2,22);

 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.