<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Sort within a groupBy with dataframe in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/sort-within-a-groupby-with-dataframe/m-p/27698#M19559</link>
    <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Using Spark DataFrame, eg.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;myDf
  .filter(col("timestamp").gt(15000))
  .groupBy("groupingKey")
  .agg(collect_list("aDoubleValue"))
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;I want the collect_list to return the result, but ordered according to "timestamp". i.a. I want the GroupBy results to be sorted by another column.&lt;/P&gt;
&lt;P&gt;I know there are other issues about it, but I couldn't find a reliable answer with DataFrame.&lt;/P&gt;
&lt;P&gt;How can this be done? (the answer: sort the myDf by "timestamp" before the gorupBy is not good)&lt;/P&gt;
&lt;P&gt;I already asked the question on stack-overflow, see &lt;A href="https://stackoverflow.com/questions/58239182/spark-sort-within-a-groupby-with-dataframe?noredirect=1#comment102852695_58239182" target="test_blank"&gt;https://stackoverflow.com/questions/58239182/spark-sort-within-a-groupby-with-dataframe?noredirect=1#comment102852695_58239182&lt;/A&gt; but I'd like not to use a temporary structure (because there are many fields that I use in the group-by)&lt;/P&gt;
&lt;P&gt;Thanks.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 07 Oct 2019 07:01:20 GMT</pubDate>
    <dc:creator>LaurentThiebaud</dc:creator>
    <dc:date>2019-10-07T07:01:20Z</dc:date>
    <item>
      <title>Sort within a groupBy with dataframe</title>
      <link>https://community.databricks.com/t5/data-engineering/sort-within-a-groupby-with-dataframe/m-p/27698#M19559</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Using Spark DataFrame, eg.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;myDf
  .filter(col("timestamp").gt(15000))
  .groupBy("groupingKey")
  .agg(collect_list("aDoubleValue"))
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;I want the collect_list to return the result, but ordered according to "timestamp". i.a. I want the GroupBy results to be sorted by another column.&lt;/P&gt;
&lt;P&gt;I know there are other issues about it, but I couldn't find a reliable answer with DataFrame.&lt;/P&gt;
&lt;P&gt;How can this be done? (the answer: sort the myDf by "timestamp" before the gorupBy is not good)&lt;/P&gt;
&lt;P&gt;I already asked the question on stack-overflow, see &lt;A href="https://stackoverflow.com/questions/58239182/spark-sort-within-a-groupby-with-dataframe?noredirect=1#comment102852695_58239182" target="test_blank"&gt;https://stackoverflow.com/questions/58239182/spark-sort-within-a-groupby-with-dataframe?noredirect=1#comment102852695_58239182&lt;/A&gt; but I'd like not to use a temporary structure (because there are many fields that I use in the group-by)&lt;/P&gt;
&lt;P&gt;Thanks.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 07 Oct 2019 07:01:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/sort-within-a-groupby-with-dataframe/m-p/27698#M19559</guid>
      <dc:creator>LaurentThiebaud</dc:creator>
      <dc:date>2019-10-07T07:01:20Z</dc:date>
    </item>
    <item>
      <title>Re: Sort within a groupBy with dataframe</title>
      <link>https://community.databricks.com/t5/data-engineering/sort-within-a-groupby-with-dataframe/m-p/27699#M19560</link>
      <description>&lt;P&gt;Hi @Laurent Thiebaud,&lt;/P&gt;&lt;P&gt;Please use the below format to sort within a groupby, &lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;import org.apache.spark.sql.functions._ 
df.groupBy("columnA").agg(sort_array(collect_list("columnB")))&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 07 Oct 2019 08:43:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/sort-within-a-groupby-with-dataframe/m-p/27699#M19560</guid>
      <dc:creator>shyam_9</dc:creator>
      <dc:date>2019-10-07T08:43:59Z</dc:date>
    </item>
  </channel>
</rss>

