cancel
Showing results for 
Search instead for 
Did you mean: 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results for 
Search instead for 
Did you mean: 

Incorrect results of row_number() function

rocky5
New Contributor III

I wrote simple code:

from pyspark.sql import SparkSession
from pyspark.sql.window import Window
from pyspark.sql.functions import row_number, max
import pyspark.sql.functions as F

streaming_data = spark.read.table("x")
window = Window.partitionBy("BK_AccountApplicationId").orderBy(F.col("Onboarding_External_LakehouseId").desc())
test = streaming_data.select("BK_AccountApplicationId", "Onboarding_External_LakehouseId").distinct()
test1 = test.select("BK_AccountApplicationId", "Onboarding_External_LakehouseId",
                     F.row_number().over(window).alias("row_num")).show(20,truncate=False)

I am surprised when I can see below results:

+------------------------------------+-------------------------------+-------+
|BK_AccountApplicationId             |Onboarding_External_LakehouseId|row_num|
+------------------------------------+-------------------------------+-------+
|abcd0001-5775-4f93-a39a-eefb29cd8ffe|2                              |1      |
|abcd0002-5775-4f93-a39a-eefb29cd8ffe|3                              |1      |
|abcd0003-5775-4f93-a39a-eefb29cd8ffe|4                              |1      |
|abcd0004-5775-4f93-a39a-eefb29cd8ffe|5                              |1      |
|abcd0005-5775-4f93-a39a-eefb29cd8ffe|7                              |1      |
|abcd0005-5775-4f93-a39a-eefb29cd8ffe|6                              |2      |
|abcd0006-5775-4f93-a39a-eefb29cd8ffe|8                              |1      |
|abcd0007-5775-4f93-a39a-eefb29cd8ffe|9                              |1      |
|abcd0008-5775-4f93-a39a-eefb29cd8ffe|12                             |1      |
|abcd0008-5775-4f93-a39a-eefb29cd8ffe|11                             |2      |
+------------------------------------+-------------------------------+-------+

so, for one BK_AccountApplicationId  you can see multiple LakehouseIDs - why multiple rows with LakehouseId lower than 12 has row_num=1 - does anyone is able to explain mi that?

1 ACCEPTED SOLUTION

Accepted Solutions

ThomazRossito
Contributor

Hi,

In my opinion the result is correct
What needs to be noted in the result is that it is sorted by the "Onboarding_External_LakehouseId" column so if there is "BK_AccountApplicationId" with the same code, it will be partitioned into 2 row_numbers

Just like in the example below:
Here there are 2 BK_AccountApplicationId, equal, then there are 2 row_number, the most recent (or greatest) row_number being. "Onboarding_External_LakehouseId" is equal to 7, which is why its row_number is 1

|abcd0005-5775-4f93-a39a-eefb29cd8ffe|7 |1 |
|abcd0005-5775-4f93-a39a-eefb29cd8ffe|6 |2 |

Att.
Thomaz Antonio Rossito Neto
Data Technical Lead / Data Engineer-Architect

View solution in original post

1 REPLY 1

ThomazRossito
Contributor

Hi,

In my opinion the result is correct
What needs to be noted in the result is that it is sorted by the "Onboarding_External_LakehouseId" column so if there is "BK_AccountApplicationId" with the same code, it will be partitioned into 2 row_numbers

Just like in the example below:
Here there are 2 BK_AccountApplicationId, equal, then there are 2 row_number, the most recent (or greatest) row_number being. "Onboarding_External_LakehouseId" is equal to 7, which is why its row_number is 1

|abcd0005-5775-4f93-a39a-eefb29cd8ffe|7 |1 |
|abcd0005-5775-4f93-a39a-eefb29cd8ffe|6 |2 |

Att.
Thomaz Antonio Rossito Neto
Data Technical Lead / Data Engineer-Architect
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!