cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to calculate Percentile of column in a DataFrame in spark?

dheeraj
New Contributor II

I am trying to calculate percentile of a column in a DataFrame? I cant find any percentile_approx function in Spark aggregation functions. For e.g. in Hive we have percentile_approx and we can use it in the following way

hiveContext.sql("select percentile_approx("Open_Rate",0.10) from myTable); 

But I want to do it using Spark DataFrame for performance reasons.

Sample data set

    |User ID|Open_Rate|
    ------------------- 
    |A1     |10.3     |     
    |B1     |4.04     |     
    |C1     |21.7     |     
    |D1     |18.6     | 

I want to find out how many users fall into 10 percentile or 20 percentile and so on. I want to do something like this

df.select($"id",Percentile($"Open_Rate")).show

3 REPLIES 3

dheeraj
New Contributor II

correction in the question above, i want to do something like this

 df.select($"id",Percentile($"Open_Rate",0.1)).show

SiddSingal
New Contributor II

You could try to code your own version of this. It does not seem like this functionality is built into the Spark DataFrames. You may need to use the Window class in order to accomplish this. Here is a blog post with some details: https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html

amandaphy
New Contributor II

You can try using df.registerTempTable("tmp_tbl") val newDF = sql(/ do something with tmp_tbl /)// and continue using newDF Learn More

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group