IM_01
Valued Contributor

  @-werners- Thanks for sharing , it seems intersecting I will go through the book
  @Ashwin_DSA I still have some confusion sorry 🙂 as normally when we do df.orderBy().display() it works right( as display() is action it also performs all operations thats defined prior to orderby)

Is it like the query planner breaks down operations and engine performs the operations over each partition and if orderBy has to be maintained the optimizer has to perform orderBy for each partition and it kills parallelism as the data is distributed that I was able to follow but only confusion is when we do df.orderBy().display() and it works may be some point that I am missing in understanding sorry ..