Pyspark. How to get best params in grid search
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-28-2017 03:56 AM
Hello!
I am using spark 2.1.1 in python
(python 2.7 executed in jupyter notebook)
And trying to make grid search for linear regression parameters.
My code looks like this:
from pyspark.ml.tuning import CrossValidator, ParamGridBuilder
from pyspark.ml import Pipeline
pipeline = Pipeline(stages=[
sql_transformer,
assembler,
lr])
paramGrid = ParamGridBuilder().addGrid(lr.solver, ["l-bfgs", "normal"]).build()
evaluator = RegressionEvaluator()
crossval = CrossValidator(estimator=pipeline,
estimatorParamMaps=paramGrid,
numFolds=3)
cvModel = crossval.fit(train)
cvModel.avgMetrics
out[] > [887.3183210064692, 787.3183297841774]
My question is: how i can find, which set of params whitch metric to correspond?
How i can get params of best trained model?
- Labels:
-
Gridsearchcv
-
Mllib
-
Pyspark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-07-2017 08:56 AM
To match the metrics with the sets of params:
'paramGrid' is a list of Param maps; 'avgMetrics' is a list of metrics. These 2 lists have the same order, so you can just zip them together:
zip(cvModel.avgMetrics, paramGrid)
To find the best set of params:
If you have a CrossValidatorModel (after fitting a CrossValidator), then you can get the best model from the field called bestModel. You can then use extractParamMap to get the best model's parameters:
bestPipeline = cvModel.bestModel
bestLRModel = bestPipeline.stages[2]
bestParams = bestLRModel.extractParamMap()
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-23-2018 02:11 PM
Tried the code above, bestParams still shows a null list? any thoughts?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-24-2018 12:52 AM
Tried this code, but the extractParamMap() it show some parameter but can't show the best parameter inside the paramGrid.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-24-2018 04:01 PM
This has been improved in Apache Spark 2.3.0 in https://issues.apache.org/jira/browse/SPARK-10931 which copies Param values into the Python wrappers around Scala types. extractParamMap() extracts all Params; you have to look within it for the Params from the grid which you really care about.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-26-2018 11:16 PM
let me give you an example. After I call bestModel, I will get pyspark.ml.recommendation.ALSModel. ( which is fitted model). what I really want is pyspark.ml.recommendation.ALS, this is why I cannot get the parameter in the model, for example alpha
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-12-2019 11:07 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-27-2020 07:15 PM
This is a great article. It gave me a lot of useful information. thank you very much download app