cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Pyspark. How to get best params in grid search

pmezentsev
New Contributor

Hello!

I am using spark 2.1.1 in python

(python 2.7 executed in jupyter notebook)

And trying to make grid search for linear regression parameters.

My code looks like this:

from pyspark.ml.tuning import CrossValidator, ParamGridBuilder
from pyspark.ml import Pipeline 
pipeline = Pipeline(stages=[
                sql_transformer,
                assembler,
                lr])
paramGrid = ParamGridBuilder().addGrid(lr.solver, ["l-bfgs", "normal"]).build()
evaluator = RegressionEvaluator()
crossval = CrossValidator(estimator=pipeline,
                          estimatorParamMaps=paramGrid,
                          numFolds=3) 
cvModel = crossval.fit(train)
cvModel.avgMetrics
out[] > [887.3183210064692, 787.3183297841774]

My question is: how i can find, which set of params whitch metric to correspond?

How i can get params of best trained model?

7 REPLIES 7

Joseph_B
New Contributor III
New Contributor III

To match the metrics with the sets of params:

'paramGrid' is a list of Param maps; 'avgMetrics' is a list of metrics. These 2 lists have the same order, so you can just zip them together:

zip(cvModel.avgMetrics, paramGrid)

To find the best set of params:

If you have a CrossValidatorModel (after fitting a CrossValidator), then you can get the best model from the field called bestModel. You can then use extractParamMap to get the best model's parameters:

bestPipeline = cvModel.bestModel
bestLRModel = bestPipeline.stages[2]
bestParams = bestLRModel.extractParamMap()

Tried the code above, bestParams still shows a null list? any thoughts?

Tried this code, but the extractParamMap() it show some parameter but can't show the best parameter inside the paramGrid.

Joseph_B
New Contributor III
New Contributor III

This has been improved in Apache Spark 2.3.0 in https://issues.apache.org/jira/browse/SPARK-10931 which copies Param values into the Python wrappers around Scala types. extractParamMap() extracts all Params; you have to look within it for the Params from the grid which you really care about.

let me give you an example. After I call bestModel, I will get pyspark.ml.recommendation.ALSModel. ( which is fitted model). what I really want is pyspark.ml.recommendation.ALS, this is why I cannot get the parameter in the model, for example alpha

shyam_9
Valued Contributor
Valued Contributor

Hi @pmezentsev,

You can build paramgrid with different vallues of parameters and then you'll get best params using GridSearchCV

param_grid = { 'n_estimators': [200, 500, 700], 'max_features': ['auto', 'sqrt', 'log2'] } ,

CV_rfc = GridSearchCV(estimator=rfc, param_grid=param_grid, cv= 5)

phamyen
New Contributor II

This is a great article. It gave me a lot of useful information. thank you very much download app

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.