<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic how to Calculate quantile on grouped data in spark Dataframe in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-calculate-quantile-on-grouped-data-in-spark-dataframe/m-p/29398#M21126</link>
    <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I have the following sparkdataframe :&lt;/P&gt;
&lt;P&gt; agent_id/ payment_amount&lt;/P&gt;
&lt;P&gt; a /1000&lt;/P&gt;
&lt;P&gt; b /1100&lt;/P&gt;
&lt;P&gt; a /1100&lt;/P&gt;
&lt;P&gt; a /1200&lt;/P&gt;
&lt;P&gt; b /1200 &lt;/P&gt;
&lt;P&gt; b /1250&lt;/P&gt;
&lt;P&gt; a /10000&lt;/P&gt;
&lt;P&gt; b /9000&lt;/P&gt;
&lt;P&gt;my desire output would be something like &lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;&amp;lt;code&amp;gt;agen_id   95_quantile
  a          whatever is95 quantile for agent a payments
  b          whatever is95 quantile for agent b payments&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;for each group of agent_id i need to calculate the 0.95 quantile, i take the following approach:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;&amp;lt;code&amp;gt;test_df.groupby('agent_id').approxQuantile('payment_amount',0.95)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;but i take the following error:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;&amp;lt;code&amp;gt;'GroupedData' object has no attribute 'approxQuantile'&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;i need to have .95 quantile(percentile) in a new column so later can be used for filtering purposes&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 22 Sep 2016 08:29:26 GMT</pubDate>
    <dc:creator>dshosseinyousef</dc:creator>
    <dc:date>2016-09-22T08:29:26Z</dc:date>
    <item>
      <title>how to Calculate quantile on grouped data in spark Dataframe</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-calculate-quantile-on-grouped-data-in-spark-dataframe/m-p/29398#M21126</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I have the following sparkdataframe :&lt;/P&gt;
&lt;P&gt; agent_id/ payment_amount&lt;/P&gt;
&lt;P&gt; a /1000&lt;/P&gt;
&lt;P&gt; b /1100&lt;/P&gt;
&lt;P&gt; a /1100&lt;/P&gt;
&lt;P&gt; a /1200&lt;/P&gt;
&lt;P&gt; b /1200 &lt;/P&gt;
&lt;P&gt; b /1250&lt;/P&gt;
&lt;P&gt; a /10000&lt;/P&gt;
&lt;P&gt; b /9000&lt;/P&gt;
&lt;P&gt;my desire output would be something like &lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;&amp;lt;code&amp;gt;agen_id   95_quantile
  a          whatever is95 quantile for agent a payments
  b          whatever is95 quantile for agent b payments&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;for each group of agent_id i need to calculate the 0.95 quantile, i take the following approach:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;&amp;lt;code&amp;gt;test_df.groupby('agent_id').approxQuantile('payment_amount',0.95)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;but i take the following error:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;&amp;lt;code&amp;gt;'GroupedData' object has no attribute 'approxQuantile'&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;i need to have .95 quantile(percentile) in a new column so later can be used for filtering purposes&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 22 Sep 2016 08:29:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-calculate-quantile-on-grouped-data-in-spark-dataframe/m-p/29398#M21126</guid>
      <dc:creator>dshosseinyousef</dc:creator>
      <dc:date>2016-09-22T08:29:26Z</dc:date>
    </item>
    <item>
      <title>Re: how to Calculate quantile on grouped data in spark Dataframe</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-calculate-quantile-on-grouped-data-in-spark-dataframe/m-p/29399#M21127</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;@bill i'd appreciate your help , as it is very crucial &lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 22 Sep 2016 08:30:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-calculate-quantile-on-grouped-data-in-spark-dataframe/m-p/29399#M21127</guid>
      <dc:creator>dshosseinyousef</dc:creator>
      <dc:date>2016-09-22T08:30:51Z</dc:date>
    </item>
    <item>
      <title>Re: how to Calculate quantile on grouped data in spark Dataframe</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-calculate-quantile-on-grouped-data-in-spark-dataframe/m-p/29400#M21128</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;For those of you who haven't run into this SO thread &lt;A href="http://stackoverflow.com/questions/39633614/calculate-quantile-on-grouped-data-in-spark-dataframe" target="test_blank"&gt;http://stackoverflow.com/questions/39633614/calculate-quantile-on-grouped-data-in-spark-dataframe&lt;/A&gt;, it's pointed out there that one work-around is to use HIVE UDF "percentile_approx". Please see the accepted answer in that SO thread.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 30 Dec 2016 18:17:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-calculate-quantile-on-grouped-data-in-spark-dataframe/m-p/29400#M21128</guid>
      <dc:creator>Weiluo__David_R</dc:creator>
      <dc:date>2016-12-30T18:17:54Z</dc:date>
    </item>
  </channel>
</rss>

