cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

KNN classifier on Spark

Muthu145
New Contributor

Hi Team ,

Can you please help me in implementing KNN classifer in pyspark using distributed architecture and processing the dataset.

Even I want to validate the KNN model with the testing dataset.

I tried to use scikit learn but the program is running locally. I want to distirbute the classifier while train the model.

At the end, I want to validate the classifier with testing dataset and Calculate the accuracy.

3 REPLIES 3

raela
Databricks Employee
Databricks Employee

Refer to the programming guide to see the algorithms available in MLlib:

http://spark.apache.org/docs/latest/ml-classification-regression.html

There is no KNN in MLlib, you might want to try another algorithm that's available.

rlgarris
Databricks Employee
Databricks Employee

Hi - KNN is notoriously hard to parallelize in Spark because KNN is a "lazy learner" and the model itself is the entire dataset. Most single machine implementations rely on KD Trees or Ball Trees to store the entire dataset in the RAM of a single machine. I would recommend using scikit-learn's single machine implementation with a Simple Random Sample of the dataset if you really want to use KNN.

SouravSaha
New Contributor II

Hey, about about using NEC Frovedis (https://github.com/frovedis/frovedis) framework for the same.

Refer: https://github.com/frovedis/frovedis/blob/master/src/foreign_if/python/examples/unsupervised_knn_dem...

It works on a distributed framework (MPI based) and can run on any system.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group