Re: How to calculate correlation matrix (with all columns at once) in pyspark dataframe?

- - Certifications
- - Learning Paths
- - Databricks Product Tours
- - Get Started Guides

- - Get Started Resources
- - Announcements
- - Community Articles
- - Databricks TV
- - Learning Events
- - MVP Articles
- - Product Platform Updates
- - Support FAQs
- - Technical Blog
- - Community Events
- - BrickTalks TV

- - Databricks Academy Learners
  - Databricks Academy Learners Forum
- - Regional and Interest Groups
- - Private Groups

- - Databricks Community Champions
- - Khoros Community Forums Support (Not for Databricks Product Questions)
- - Databricks Community Code of Conduct
- - DAIS 2026

Data Engineering

Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.

1 ACCEPTED SOLUTION

Accepted Solutions

got it use -

features = dataset.map(lambda row: row[0:])

from pyspark.mllib.stat import Statistics

corr_mat=Statistics.corr(features, method="pearson")

View solution in original post

1 REPLY 1

got it use -

features = dataset.map(lambda row: row[0:])

from pyspark.mllib.stat import Statistics

corr_mat=Statistics.corr(features, method="pearson")

never-displayed

You must be signed in to add attachments

never-displayed

Announcements

The Next Wave of Enterprise AI | Webinar

🌟 Community Pulse: Your Weekly Roundup! June 29 – July 05, 2026

Solution Accelerator Series | Identify Fraud With Geospatial Analytics and AI

Databricks Community Champion - June 2026 - Amira Bedhiafi

Databricks Community

How to calculate correlation matrix (with all columns at once) in pyspark dataframe?

The Next Wave of Enterprise AI | Webinar

🌟 Community Pulse: Your Weekly Roundup! June 29 – July 05, 2026

📌‌ Complete Your Profile – Help Others Get to Know You

Solution Accelerator Series | Identify Fraud With Geospatial Analytics and AI

Databricks Community Champion - June 2026 - Amira Bedhiafi