โ11-18-2021 10:46 AM
Hi, I'm wondering if this is the expected behavior when using last or last_value in a window function? I've written a query like this:
select
col1,
col2,
last_value(col2) over (partition by col1 order by col2) as column2_last
from values
(1, 10), (1, 11), (1, 12),
(2, 20), (2, 21), (2, 22);
In Snowflake I get the following results. The behavior feels right, since we're partitioning by col1 and ordering by col2 (https://docs.snowflake.com/en/sql-reference/functions/last_value.html).
In Databricks, I get:
It seems like the ORDER BY doesn't get applied... Perhaps I am writing the query incorrectly?
Any insight into this behavior would be very helpful. Also, I'm curious why last and last_value are synonyms? https://docs.databricks.com/sql/language-manual/functions/last.html
Thanks!
โ11-19-2021 08:38 AM
@Alexis Johnsonโ - Here is where you can open a support ticket:
โ11-18-2021 11:23 AM
Hi @ alexisjohnson! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else I will get back to you soon. Thanks.
โ11-18-2021 12:44 PM
yes indeed it looks like a bug with LAST_VALUE,
Min gives correct results:
Max hmm only when desc is specified (shouldn't affect as max is max but only with descending sorting it works as should be):
for last_value is exactly like on your example (so nothing helps - I tried many ways).
@Kaniz Fatmaโ maybe someone from inside databricks could look at it
โ11-18-2021 12:47 PM
Agreed. Interesting find that `max` behaves funny when `desc` isn't specified.
I'd also note that `first_value` yields the correct result.
@Kaniz Fatmaโ Is there a process for filing bugs in Databricks? It'd be great to get this fixed.
โ11-19-2021 12:58 AM
Hi @Alexis Johnsonโ , You can open a support ticket with Databricks in order to fix this.
โ11-19-2021 08:38 AM
โ11-18-2021 12:48 PM
Thank you for your help, @Hubert Dudekโ !
โ07-12-2023 01:00 PM
For those stumbling across this; it seems LAST_VALUE emulates the same functionality as it does in SQL Server which does not, in most people's minds, have a proper row/range frame for the window. You can adjust it with the below syntax.
I understand last_value emulating what is in my mind a mistake from sql_server but I don't understand why last() (an alias for last_value) has to use the same window row range.
SELECT
col1
, col2
, LAST_VALUE(col2) OVER (PARTITION BY col1 ORDER BY col2 ASC) as column2_last
, LAST_VALUE(col2) OVER (PARTITION BY col1 ORDER BY col2 ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as column2_last_unbound
, FIRST_VALUE(col2) OVER (PARTITION BY col1 ORDER BY col2 ASC) as column2_first
, FIRST_VALUE(col2) OVER (PARTITION BY col1 ORDER BY col2 DESC) as column2_first_reversed
FROM values
(1,10), (1,12), (1,11),
(2,20), (2,21), (2,22);
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group