cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

collect_list by preserving order based on another variable - Spark SQL

Constantine
Contributor III

I am using databricks sql notebook to run these queries. 

I have a Python UDF like 

  

%python
 
 from pyspark.sql.functions import udf
 from pyspark.sql.types import StringType, DoubleType, DateType
 
 def get_sell_price(sale_prices):
     return sale_price[0] 
 
spark.udf.register("get_sell_price", get_sell_price, DoubleType()) 

This is running on a query like 

SELECT
  id,
  get_sell_price(sell_price)
FROM
  table_name
GROUP BY
  id
ORDER BY
  date;

I want the sell price inside the `collect_list` to be sorted based on the specified column, but even though I mention it in the query, it still doesn't maintain the order

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

@John Constantine​, I think you want to use a window function for this.

View solution in original post

3 REPLIES 3

Hubert-Dudek
Esteemed Contributor III

@John Constantine​ , "The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle." https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.functions.collect_list.htm...

Generally using collect_list in production is not the best solution. Usually, there are other ways to achieve what is needed.

-werners-
Esteemed Contributor III

@John Constantine​, I think you want to use a window function for this.

Kaniz
Community Manager
Community Manager

Hi @John Constantine​ , Just a friendly follow-up. Do you still need help or do the above responses help you find the solution? Please let us know.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.