How to calculate correlation matrix (with all columns at once) in pyspark dataframe?

washim — Mon, 28 Dec 2015 09:07:31 GMT

Re: How to calculate correlation matrix (with all columns at once) in pyspark dataframe?

washim — Mon, 28 Dec 2015 10:00:58 GMT

got it use -

features = dataset.map(lambda row: row[0:])

from pyspark.mllib.stat import Statistics

corr_mat=Statistics.corr(features, method="pearson")