Using Spark DataFrame, eg.
myDf
.filter(col("timestamp").gt(15000))
.groupBy("groupingKey")
.agg(collect_list("aDoubleValue"))
I want the collect_list to return the result, but ordered according to "timestamp". i.a. I want the GroupBy results to be sorted by another column.
I know there are other issues about it, but I couldn't find a reliable answer with DataFrame.
How can this be done? (the answer: sort the myDf by "timestamp" before the gorupBy is not good)
I already asked the question on stack-overflow, see https://stackoverflow.com/questions/58239182/spark-sort-within-a-groupby-with-dataframe?noredirect=1... but I'd like not to use a temporary structure (because there are many fields that I use in the group-by)
Thanks.