Databricks Community

alexisjohnson · ‎11-18-2021

Hi, I'm wondering if this is the expected behavior when using last or last_value in a window function? I've written a query like this:

select
    col1,
    col2,
    last_value(col2) over (partition by col1 order by col2) as column2_last
from values
    (1, 10), (1, 11), (1, 12),
    (2, 20), (2, 21), (2, 22);

In Snowflake I get the following results. The behavior feels right, since we're partitioning by col1 and ordering by col2 (https://docs.snowflake.com/en/sql-reference/functions/last_value.html).

Screen Shot 2021-11-18 at 12.48.25 PM

In Databricks, I get:

Screen Shot 2021-11-18 at 12.48.32 PM It seems like the ORDER BY doesn't get applied... Perhaps I am writing the query incorrectly?

Any insight into this behavior would be very helpful. Also, I'm curious why last and last_value are synonyms? https://docs.databricks.com/sql/language-manual/functions/last.html

Thanks!

Anonymous · ‎11-19-2021

@Alexis Johnson - Here is where you can open a support ticket:

Contact Us

View solution in original post

Hubert-Dudek · ‎11-18-2021

yes indeed it looks like a bug with LAST_VALUE,

Min gives correct results:

Max hmm only when desc is specified (shouldn't affect as max is max but only with descending sorting it works as should be):

for last_value is exactly like on your example (so nothing helps - I tried many ways).

@Kaniz Fatma maybe someone from inside databricks could look at it

alexisjohnson · ‎11-18-2021

Agreed. Interesting find that `max` behaves funny when `desc` isn't specified.

I'd also note that `first_value` yields the correct result.

@Kaniz Fatma Is there a process for filing bugs in Databricks? It'd be great to get this fixed.

Anonymous · ‎11-19-2021

@Alexis Johnson - Here is where you can open a support ticket:

Contact Us

alexisjohnson · ‎11-18-2021

Thank you for your help, @Hubert Dudek !

Carv · ‎07-12-2023

For those stumbling across this; it seems LAST_VALUE emulates the same functionality as it does in SQL Server which does not, in most people's minds, have a proper row/range frame for the window. You can adjust it with the below syntax.

I understand last_value emulating what is in my mind a mistake from sql_server but I don't understand why last() (an alias for last_value) has to use the same window row range.

SELECT
    col1
  , col2
  , LAST_VALUE(col2) OVER (PARTITION BY col1 ORDER BY col2 ASC) as column2_last
  , LAST_VALUE(col2) OVER (PARTITION BY col1 ORDER BY col2 ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as column2_last_unbound
  , FIRST_VALUE(col2) OVER (PARTITION BY col1 ORDER BY col2 ASC) as column2_first
  , FIRST_VALUE(col2) OVER (PARTITION BY col1 ORDER BY col2 DESC) as column2_first_reversed
FROM values
  (1,10), (1,12), (1,11),
  (2,20), (2,21), (2,22);