cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

calculate median and inter quartile range on spark dataframe

NarwshKumar
New Contributor

I have a spark dataframe of 5 columns and I want to calculate median and interquartile range on all. I am not able to figure out how do I write udf and call them on columns.

3 REPLIES 3

raela
New Contributor III
New Contributor III

Hi,

You can refer to Databricks' docs on how to create UDFs.

Python UDFs | Scala UDFs

User16826991422
Contributor

Hi,

You can also using Windowing functions https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html';

percent_rank at 0.25, 0.50 and 0.75 will give you want you are looking for.

jmwilli25
New Contributor II

Here is the easiest way to calculate this... https://stackoverflow.com/questions/37032689/scala-first-quartile-third-quartile-and-iqr-from-spark-...

No Hive or windowing necessary.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.