02-06-2016 07:11 AM
I have a spark dataframe of 5 columns and I want to calculate median and interquartile range on all. I am not able to figure out how do I write udf and call them on columns.
02-08-2016 04:23 PM
Hi,
You can refer to Databricks' docs on how to create UDFs.
Python UDFs | Scala UDFs
02-09-2016 07:42 AM
You can also using Windowing functions https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html';
percent_rank at 0.25, 0.50 and 0.75 will give you want you are looking for.
05-23-2017 03:28 PM
Here is the easiest way to calculate this... https://stackoverflow.com/questions/37032689/scala-first-quartile-third-quartile-and-iqr-from-spark-...
No Hive or windowing necessary.
never-displayed