collect_list by preserving order based on another variable - Spark SQL

Constantine
Contributor III

I am using databricks sql notebook to run these queries. 

I have a Python UDF like 

  

%python
 
 from pyspark.sql.functions import udf
 from pyspark.sql.types import StringType, DoubleType, DateType
 
 def get_sell_price(sale_prices):
     return sale_price[0] 
 
spark.udf.register("get_sell_price", get_sell_price, DoubleType()) 

This is running on a query like 

SELECT
  id,
  get_sell_price(sell_price)
FROM
  table_name
GROUP BY
  id
ORDER BY
  date;

I want the sell price inside the `collect_list` to be sorted based on the specified column, but even though I mention it in the query, it still doesn't maintain the order