<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: KNN classifier on Spark in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/knn-classifier-on-spark/m-p/29330#M21070</link>
    <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Hi - KNN is notoriously hard to parallelize in Spark because KNN is a "lazy learner" and the model itself is the entire dataset. Most single machine implementations rely on KD Trees or Ball Trees to store the entire dataset in the RAM of a single machine. I would recommend using scikit-learn's single machine implementation with a Simple Random Sample of the dataset if you really want to use KNN.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 27 Dec 2016 18:51:19 GMT</pubDate>
    <dc:creator>rlgarris</dc:creator>
    <dc:date>2016-12-27T18:51:19Z</dc:date>
    <item>
      <title>KNN classifier on Spark</title>
      <link>https://community.databricks.com/t5/data-engineering/knn-classifier-on-spark/m-p/29328#M21068</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Hi Team , &lt;/P&gt;
&lt;P&gt;Can you please help me in implementing KNN classifer in pyspark using distributed architecture and processing the dataset.&lt;/P&gt;
&lt;P&gt;Even I want to validate the KNN model with the testing dataset.&lt;/P&gt;
&lt;P&gt;I tried to use scikit learn but the program is running locally. I want to distirbute the classifier while train the model.&lt;/P&gt;
&lt;P&gt;At the end, I want to validate the classifier with testing dataset and Calculate the accuracy.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 20 Dec 2016 00:50:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/knn-classifier-on-spark/m-p/29328#M21068</guid>
      <dc:creator>Muthu145</dc:creator>
      <dc:date>2016-12-20T00:50:09Z</dc:date>
    </item>
    <item>
      <title>Re: KNN classifier on Spark</title>
      <link>https://community.databricks.com/t5/data-engineering/knn-classifier-on-spark/m-p/29329#M21069</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Refer to the programming guide to see the algorithms available in MLlib:&lt;/P&gt;
&lt;P&gt;&lt;A href="http://spark.apache.org/docs/latest/ml-classification-regression.html" target="test_blank"&gt;http://spark.apache.org/docs/latest/ml-classification-regression.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;There is no KNN in MLlib, you might want to try another algorithm that's available.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 22 Dec 2016 17:51:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/knn-classifier-on-spark/m-p/29329#M21069</guid>
      <dc:creator>raela</dc:creator>
      <dc:date>2016-12-22T17:51:16Z</dc:date>
    </item>
    <item>
      <title>Re: KNN classifier on Spark</title>
      <link>https://community.databricks.com/t5/data-engineering/knn-classifier-on-spark/m-p/29330#M21070</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Hi - KNN is notoriously hard to parallelize in Spark because KNN is a "lazy learner" and the model itself is the entire dataset. Most single machine implementations rely on KD Trees or Ball Trees to store the entire dataset in the RAM of a single machine. I would recommend using scikit-learn's single machine implementation with a Simple Random Sample of the dataset if you really want to use KNN.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 27 Dec 2016 18:51:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/knn-classifier-on-spark/m-p/29330#M21070</guid>
      <dc:creator>rlgarris</dc:creator>
      <dc:date>2016-12-27T18:51:19Z</dc:date>
    </item>
    <item>
      <title>Re: KNN classifier on Spark</title>
      <link>https://community.databricks.com/t5/data-engineering/knn-classifier-on-spark/m-p/29331#M21071</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Hey, about about using NEC Frovedis (https://github.com/frovedis/frovedis) framework for the same. &lt;/P&gt;
&lt;P&gt;Refer: &lt;A href="https://github.com/frovedis/frovedis/blob/master/src/foreign_if/python/examples/unsupervised_knn_demo.py" target="test_blank"&gt;https://github.com/frovedis/frovedis/blob/master/src/foreign_if/python/examples/unsupervised_knn_demo.py&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;It works on a distributed framework (MPI based) and can run on any system.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 05 Feb 2020 02:31:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/knn-classifier-on-spark/m-p/29331#M21071</guid>
      <dc:creator>SouravSaha</dc:creator>
      <dc:date>2020-02-05T02:31:46Z</dc:date>
    </item>
  </channel>
</rss>

