calculate median and inter quartile range on spark dataframe

NarwshKumar
New Contributor

I have a spark dataframe of 5 columns and I want to calculate median and interquartile range on all. I am not able to figure out how do I write udf and call them on columns.

raela
Databricks Employee
Databricks Employee

Hi,

You can refer to Databricks' docs on how to create UDFs.

Python UDFs | Scala UDFs

rlgarris
Databricks Employee
Databricks Employee

Hi,

You can also using Windowing functions https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html';

percent_rank at 0.25, 0.50 and 0.75 will give you want you are looking for.

jmwilli25
New Contributor II

Here is the easiest way to calculate this... https://stackoverflow.com/questions/37032689/scala-first-quartile-third-quartile-and-iqr-from-spark-...

No Hive or windowing necessary.