cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Pyspark. How to get best params in grid search

pmezentsev
New Contributor

Hello!

I am using spark 2.1.1 in python

(python 2.7 executed in jupyter notebook)

And trying to make grid search for linear regression parameters.

My code looks like this:

from pyspark.ml.tuning import CrossValidator, ParamGridBuilder
from pyspark.ml import Pipeline 
pipeline = Pipeline(stages=[
                sql_transformer,
                assembler,
                lr])
paramGrid = ParamGridBuilder().addGrid(lr.solver, ["l-bfgs", "normal"]).build()
evaluator = RegressionEvaluator()
crossval = CrossValidator(estimator=pipeline,
                          estimatorParamMaps=paramGrid,
                          numFolds=3) 
cvModel = crossval.fit(train)
cvModel.avgMetrics
out[] > [887.3183210064692, 787.3183297841774]

My question is: how i can find, which set of params whitch metric to correspond?

How i can get params of best trained model?

7 REPLIES 7

Joseph_B
New Contributor III
New Contributor III

To match the metrics with the sets of params:

'paramGrid' is a list of Param maps; 'avgMetrics' is a list of metrics. These 2 lists have the same order, so you can just zip them together:

zip(cvModel.avgMetrics, paramGrid)

To find the best set of params:

If you have a CrossValidatorModel (after fitting a CrossValidator), then you can get the best model from the field called bestModel. You can then use extractParamMap to get the best model's parameters:

bestPipeline = cvModel.bestModel
bestLRModel = bestPipeline.stages[2]
bestParams = bestLRModel.extractParamMap()

Tried the code above, bestParams still shows a null list? any thoughts?

Tried this code, but the extractParamMap() it show some parameter but can't show the best parameter inside the paramGrid.

Joseph_B
New Contributor III
New Contributor III

This has been improved in Apache Spark 2.3.0 in https://issues.apache.org/jira/browse/SPARK-10931 which copies Param values into the Python wrappers around Scala types. extractParamMap() extracts all Params; you have to look within it for the Params from the grid which you really care about.

let me give you an example. After I call bestModel, I will get pyspark.ml.recommendation.ALSModel. ( which is fitted model). what I really want is pyspark.ml.recommendation.ALS, this is why I cannot get the parameter in the model, for example alpha

shyam_9
Valued Contributor
Valued Contributor

Hi @pmezentsev,

You can build paramgrid with different vallues of parameters and then you'll get best params using GridSearchCV

param_grid = { 'n_estimators': [200, 500, 700], 'max_features': ['auto', 'sqrt', 'log2'] } ,

CV_rfc = GridSearchCV(estimator=rfc, param_grid=param_grid, cv= 5)

phamyen
New Contributor II

This is a great article. It gave me a lot of useful information. thank you very much download app

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!