how to Calculate quantile on grouped data in spark Dataframe
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-22-2016 01:29 AM
I have the following sparkdataframe :
agent_id/ payment_amount
a /1000
b /1100
a /1100
a /1200
b /1200
b /1250
a /10000
b /9000
my desire output would be something like
<code>agen_id 95_quantile
a whatever is95 quantile for agent a payments
b whatever is95 quantile for agent b payments
for each group of agent_id i need to calculate the 0.95 quantile, i take the following approach:
<code>test_df.groupby('agent_id').approxQuantile('payment_amount',0.95)
but i take the following error:
<code>'GroupedData' object has no attribute 'approxQuantile'
i need to have .95 quantile(percentile) in a new column so later can be used for filtering purposes
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-22-2016 01:30 AM
@bill i'd appreciate your help , as it is very crucial
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-30-2016 10:17 AM
For those of you who haven't run into this SO thread http://stackoverflow.com/questions/39633614/calculate-quantile-on-grouped-data-in-spark-dataframe, it's pointed out there that one work-around is to use HIVE UDF "percentile_approx". Please see the accepted answer in that SO thread.