07-28-2017 03:56 AM
Hello!
I am using spark 2.1.1 in python
(python 2.7 executed in jupyter notebook)
And trying to make grid search for linear regression parameters.
My code looks like this:
from pyspark.ml.tuning import CrossValidator, ParamGridBuilder
from pyspark.ml import Pipeline
pipeline = Pipeline(stages=[
sql_transformer,
assembler,
lr])
paramGrid = ParamGridBuilder().addGrid(lr.solver, ["l-bfgs", "normal"]).build()
evaluator = RegressionEvaluator()
crossval = CrossValidator(estimator=pipeline,
estimatorParamMaps=paramGrid,
numFolds=3)
cvModel = crossval.fit(train)
cvModel.avgMetrics
out[] > [887.3183210064692, 787.3183297841774]
My question is: how i can find, which set of params whitch metric to correspond?
How i can get params of best trained model?
09-07-2017 08:56 AM
To match the metrics with the sets of params:
'paramGrid' is a list of Param maps; 'avgMetrics' is a list of metrics. These 2 lists have the same order, so you can just zip them together:
zip(cvModel.avgMetrics, paramGrid)
To find the best set of params:
If you have a CrossValidatorModel (after fitting a CrossValidator), then you can get the best model from the field called bestModel. You can then use extractParamMap to get the best model's parameters:
bestPipeline = cvModel.bestModel
bestLRModel = bestPipeline.stages[2]
bestParams = bestLRModel.extractParamMap()
01-23-2018 02:11 PM
Tried the code above, bestParams still shows a null list? any thoughts?
08-24-2018 12:52 AM
Tried this code, but the extractParamMap() it show some parameter but can't show the best parameter inside the paramGrid.
08-24-2018 04:01 PM
This has been improved in Apache Spark 2.3.0 in https://issues.apache.org/jira/browse/SPARK-10931 which copies Param values into the Python wrappers around Scala types. extractParamMap() extracts all Params; you have to look within it for the Params from the grid which you really care about.
08-26-2018 11:16 PM
let me give you an example. After I call bestModel, I will get pyspark.ml.recommendation.ALSModel. ( which is fitted model). what I really want is pyspark.ml.recommendation.ALS, this is why I cannot get the parameter in the model, for example alpha
09-12-2019 11:07 PM
05-27-2020 07:15 PM
This is a great article. It gave me a lot of useful information. thank you very much download app
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group