<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Pyspark. How to get best params in grid search in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/pyspark-how-to-get-best-params-in-grid-search/m-p/29173#M20925</link>
    <description>&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hi @pmezentsev,&lt;/P&gt;&lt;P&gt;You can build paramgrid with different vallues of parameters and then you'll get best params using GridSearchCV&lt;/P&gt;&lt;P&gt;&lt;A href="https://users/5354/pmezentsev.html" target="_blank"&gt;&lt;/A&gt;param_grid = { 'n_estimators': [200, 500, 700], 'max_features': ['auto', 'sqrt', 'log2'] } ,&lt;/P&gt;&lt;P&gt;CV_rfc = GridSearchCV(estimator=rfc, param_grid=param_grid, cv= 5)&lt;/P&gt;</description>
    <pubDate>Fri, 13 Sep 2019 06:07:20 GMT</pubDate>
    <dc:creator>shyam_9</dc:creator>
    <dc:date>2019-09-13T06:07:20Z</dc:date>
    <item>
      <title>Pyspark. How to get best params in grid search</title>
      <link>https://community.databricks.com/t5/data-engineering/pyspark-how-to-get-best-params-in-grid-search/m-p/29167#M20919</link>
      <description>&lt;P&gt;Hello!&lt;/P&gt;&lt;P&gt;I am using spark 2.1.1 in python&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;(python 2.7 executed in jupyter notebook)&lt;/P&gt;&lt;P&gt;And trying to make grid search for linear regression parameters.&lt;/P&gt;&lt;P&gt;My code looks like this:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;from pyspark.ml.tuning import CrossValidator, ParamGridBuilder
from pyspark.ml import Pipeline 
pipeline = Pipeline(stages=[
                sql_transformer,
                assembler,
                lr])
paramGrid = ParamGridBuilder().addGrid(lr.solver, ["l-bfgs", "normal"]).build()
evaluator = RegressionEvaluator()
crossval = CrossValidator(estimator=pipeline,
                          estimatorParamMaps=paramGrid,
                          numFolds=3) 
cvModel = crossval.fit(train)
cvModel.avgMetrics&lt;/CODE&gt;&lt;/PRE&gt;&lt;PRE&gt;&lt;CODE&gt;out[] &amp;gt; [887.3183210064692, 787.3183297841774]&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;My question is: how i can find, which set of params whitch metric to correspond?&lt;/P&gt;&lt;P&gt;How i can get params of best trained model?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 28 Jul 2017 10:56:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyspark-how-to-get-best-params-in-grid-search/m-p/29167#M20919</guid>
      <dc:creator>pmezentsev</dc:creator>
      <dc:date>2017-07-28T10:56:25Z</dc:date>
    </item>
    <item>
      <title>Re: Pyspark. How to get best params in grid search</title>
      <link>https://community.databricks.com/t5/data-engineering/pyspark-how-to-get-best-params-in-grid-search/m-p/29168#M20920</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;B&gt;To match the metrics with the sets of params:&lt;/B&gt;&lt;/P&gt;
&lt;P&gt;'paramGrid' is a list of Param maps; 'avgMetrics' is a list of metrics. These 2 lists have the same order, so you can just zip them together:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;zip(cvModel.avgMetrics, paramGrid)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;B&gt;To find the best set of params:&lt;/B&gt;&lt;/P&gt;
&lt;P&gt;If you have a CrossValidatorModel (after fitting a CrossValidator), then you can get the best model from the field called bestModel. You can then use extractParamMap to get the best model's parameters:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;bestPipeline = cvModel.bestModel
bestLRModel = bestPipeline.stages[2]
bestParams = bestLRModel.extractParamMap()&lt;/CODE&gt;&lt;/PRE&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 07 Sep 2017 15:56:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyspark-how-to-get-best-params-in-grid-search/m-p/29168#M20920</guid>
      <dc:creator>Joseph_B</dc:creator>
      <dc:date>2017-09-07T15:56:46Z</dc:date>
    </item>
    <item>
      <title>Re: Pyspark. How to get best params in grid search</title>
      <link>https://community.databricks.com/t5/data-engineering/pyspark-how-to-get-best-params-in-grid-search/m-p/29169#M20921</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Tried the code above, bestParams still shows a null list? any thoughts?&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jan 2018 22:11:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyspark-how-to-get-best-params-in-grid-search/m-p/29169#M20921</guid>
      <dc:creator>keerthana151094</dc:creator>
      <dc:date>2018-01-23T22:11:43Z</dc:date>
    </item>
    <item>
      <title>Re: Pyspark. How to get best params in grid search</title>
      <link>https://community.databricks.com/t5/data-engineering/pyspark-how-to-get-best-params-in-grid-search/m-p/29170#M20922</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Tried this code, but the extractParamMap() it show some parameter but can't show the best parameter inside the paramGrid. &lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 24 Aug 2018 07:52:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyspark-how-to-get-best-params-in-grid-search/m-p/29170#M20922</guid>
      <dc:creator>AldySyahdeini</dc:creator>
      <dc:date>2018-08-24T07:52:20Z</dc:date>
    </item>
    <item>
      <title>Re: Pyspark. How to get best params in grid search</title>
      <link>https://community.databricks.com/t5/data-engineering/pyspark-how-to-get-best-params-in-grid-search/m-p/29171#M20923</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;This has been improved in Apache Spark 2.3.0 in &lt;A href="https://issues.apache.org/jira/browse/SPARK-10931" target="test_blank"&gt;https://issues.apache.org/jira/browse/SPARK-10931&lt;/A&gt; which copies Param values into the Python wrappers around Scala types. extractParamMap() extracts all Params; you have to look within it for the Params from the grid which you really care about.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 24 Aug 2018 23:01:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyspark-how-to-get-best-params-in-grid-search/m-p/29171#M20923</guid>
      <dc:creator>Joseph_B</dc:creator>
      <dc:date>2018-08-24T23:01:21Z</dc:date>
    </item>
    <item>
      <title>Re: Pyspark. How to get best params in grid search</title>
      <link>https://community.databricks.com/t5/data-engineering/pyspark-how-to-get-best-params-in-grid-search/m-p/29172#M20924</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;let me give you an example. After I call bestModel, I will get pyspark.ml.recommendation.ALSModel. ( which is fitted model). what I really want is pyspark.ml.recommendation.ALS, this is why I cannot get the parameter in the model, for example alpha&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 27 Aug 2018 06:16:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyspark-how-to-get-best-params-in-grid-search/m-p/29172#M20924</guid>
      <dc:creator>AldySyahdeini</dc:creator>
      <dc:date>2018-08-27T06:16:58Z</dc:date>
    </item>
    <item>
      <title>Re: Pyspark. How to get best params in grid search</title>
      <link>https://community.databricks.com/t5/data-engineering/pyspark-how-to-get-best-params-in-grid-search/m-p/29173#M20925</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hi @pmezentsev,&lt;/P&gt;&lt;P&gt;You can build paramgrid with different vallues of parameters and then you'll get best params using GridSearchCV&lt;/P&gt;&lt;P&gt;&lt;A href="https://users/5354/pmezentsev.html" target="_blank"&gt;&lt;/A&gt;param_grid = { 'n_estimators': [200, 500, 700], 'max_features': ['auto', 'sqrt', 'log2'] } ,&lt;/P&gt;&lt;P&gt;CV_rfc = GridSearchCV(estimator=rfc, param_grid=param_grid, cv= 5)&lt;/P&gt;</description>
      <pubDate>Fri, 13 Sep 2019 06:07:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyspark-how-to-get-best-params-in-grid-search/m-p/29173#M20925</guid>
      <dc:creator>shyam_9</dc:creator>
      <dc:date>2019-09-13T06:07:20Z</dc:date>
    </item>
    <item>
      <title>Re: Pyspark. How to get best params in grid search</title>
      <link>https://community.databricks.com/t5/data-engineering/pyspark-how-to-get-best-params-in-grid-search/m-p/29174#M20926</link>
      <description>&lt;P&gt;This is a great article. It gave me a lot of useful information. thank you very much download app&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 28 May 2020 02:15:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyspark-how-to-get-best-params-in-grid-search/m-p/29174#M20926</guid>
      <dc:creator>phamyen</dc:creator>
      <dc:date>2020-05-28T02:15:18Z</dc:date>
    </item>
  </channel>
</rss>

