cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

how to Calculate quantile on grouped data in spark Dataframe

dshosseinyousef
New Contributor II

I have the following sparkdataframe :

agent_id/ payment_amount

a /1000

b /1100

a /1100

a /1200

b /1200

b /1250

a /10000

b /9000

my desire output would be something like

<code>agen_id   95_quantile
  a          whatever is95 quantile for agent a payments
  b          whatever is95 quantile for agent b payments

for each group of agent_id i need to calculate the 0.95 quantile, i take the following approach:

<code>test_df.groupby('agent_id').approxQuantile('payment_amount',0.95)

but i take the following error:

<code>'GroupedData' object has no attribute 'approxQuantile'

i need to have .95 quantile(percentile) in a new column so later can be used for filtering purposes

2 REPLIES 2

dshosseinyousef
New Contributor II

@bill i'd appreciate your help , as it is very crucial

Weiluo__David_R
New Contributor II

For those of you who haven't run into this SO thread http://stackoverflow.com/questions/39633614/calculate-quantile-on-grouped-data-in-spark-dataframe, it's pointed out there that one work-around is to use HIVE UDF "percentile_approx". Please see the accepted answer in that SO thread.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now