Databricks Community

dshosseinyousef · ‎09-22-2016

I have the following sparkdataframe :

agent_id/ payment_amount

a /1000

b /1100

a /1100

a /1200

b /1200

b /1250

a /10000

b /9000

my desire output would be something like

<code>agen_id   95_quantile
  a          whatever is95 quantile for agent a payments
  b          whatever is95 quantile for agent b payments

for each group of agent_id i need to calculate the 0.95 quantile, i take the following approach:

<code>test_df.groupby('agent_id').approxQuantile('payment_amount',0.95)

but i take the following error:

<code>'GroupedData' object has no attribute 'approxQuantile'

i need to have .95 quantile(percentile) in a new column so later can be used for filtering purposes

dshosseinyousef · ‎09-22-2016

@bill i'd appreciate your help , as it is very crucial

Weiluo__David_R · ‎12-30-2016

For those of you who haven't run into this SO thread http://stackoverflow.com/questions/39633614/calculate-quantile-on-grouped-data-in-spark-dataframe, it's pointed out there that one work-around is to use HIVE UDF "percentile_approx". Please see the accepted answer in that SO thread.

how to Calculate quantile on grouped data in spark Dataframe