Re: Pyspark Dataframes orderby only orders within ...

MarkusFra · ‎03-21-2024

Sorry if I have to ask again, but I am a bit confused by this.

I thought, that pysparks `orderBy()` and `sort()` do a shuffle operation before the sorting for exact this reason. There is another command `sortWithinPartitions()` that does not do that and does a partition wise sorting. I am acutally suprised that `sort()` also only works partition wise. But then: Why does it work on a singleNode Cluster on a partitioned DataFrame?