cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Window function using last/last_value with PARTITION BY/ORDER BY has unexpected results

alexisjohnson
New Contributor III

Hi, I'm wondering if this is the expected behavior when using last or last_value in a window function? I've written a query like this:

select
    col1,
    col2,
    last_value(col2) over (partition by col1 order by col2) as column2_last
from values
    (1, 10), (1, 11), (1, 12),
    (2, 20), (2, 21), (2, 22);

 In Snowflake I get the following results. The behavior feels right, since we're partitioning by col1 and ordering by col2 (https://docs.snowflake.com/en/sql-reference/functions/last_value.html).

Screen Shot 2021-11-18 at 12.48.25 PM 

In Databricks, I get:

Screen Shot 2021-11-18 at 12.48.32 PMIt seems like the ORDER BY doesn't get applied... Perhaps I am writing the query incorrectly?

Any insight into this behavior would be very helpful. Also, I'm curious why last and last_value are synonyms? https://docs.databricks.com/sql/language-manual/functions/last.html

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions

Anonymous
Not applicable

@Alexis Johnson​ - Here is where you can open a support ticket:

Contact Us

View solution in original post

5 REPLIES 5

Hubert-Dudek
Esteemed Contributor III

yes indeed it looks like a bug with LAST_VALUE,

Min gives correct results:

image.png 

Max hmm only when desc is specified (shouldn't affect as max is max but only with descending sorting it works as should be):

image.png 

for last_value is exactly like on your example (so nothing helps - I tried many ways).

@Kaniz Fatma​ maybe someone from inside databricks could look at it

Agreed. Interesting find that `max` behaves funny when `desc` isn't specified.

I'd also note that `first_value` yields the correct result.

@Kaniz Fatma​ Is there a process for filing bugs in Databricks? It'd be great to get this fixed.

Anonymous
Not applicable

@Alexis Johnson​ - Here is where you can open a support ticket:

Contact Us

Thank you for your help, @Hubert Dudek​ !

Carv
New Contributor II

For those stumbling across this; it seems LAST_VALUE emulates the same functionality as it does in SQL Server which does not, in most people's minds, have a proper row/range frame for the window. You can adjust it with the below syntax.

I understand last_value emulating what is in my mind a mistake from sql_server but I don't understand why last() (an alias for last_value) has to use the same window row range.

Carv_0-1689191879502.png

SELECT
    col1
  , col2
  , LAST_VALUE(col2) OVER (PARTITION BY col1 ORDER BY col2 ASC) as column2_last
  , LAST_VALUE(col2) OVER (PARTITION BY col1 ORDER BY col2 ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as column2_last_unbound
  , FIRST_VALUE(col2) OVER (PARTITION BY col1 ORDER BY col2 ASC) as column2_first
  , FIRST_VALUE(col2) OVER (PARTITION BY col1 ORDER BY col2 DESC) as column2_first_reversed
FROM values
  (1,10), (1,12), (1,11),
  (2,20), (2,21), (2,22);

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group