Best practices: Hyperparameter tuning with Hyperopt
- Bayesian approaches can be much more efficient than grid search and random search. Hence, with the Hyperopt Tree of Parzen Estimators (TPE) algorithm, you can explore more hyperparameters and larger ranges. Using domain knowledge to restrict the search domain can optimize tuning and produce better results.
- When you use
- hp.choice()
-
- , Hyperopt returns the index of the choice list. Therefore the parameter logged in MLflow is also the index. Use
- hyperopt.space_eval()
- to retrieve the parameter values.
- For models with long training times, start experimenting with small datasets and many hyperparameters. Use MLflow to identify the best performing models and determine which hyperparameters can be fixed. In this way, you can reduce the parameter space as you prepare to tune at scale.
- Take advantage of Hyperopt support for conditional dimensions and hyperparameters. For example, when you evaluate multiple flavors of gradient descent, instead of limiting the hyperparameter space to just the common hyperparameters, you can have Hyperopt include conditional hyperparametersโthe ones that are only appropriate for a subset of the flavors. For more information about using conditional parameters, see Defining a search space.
- When using
- SparkTrials
-
- , configure parallelism appropriately for CPU-only versus GPU-enabled clusters. In Azure Databricks, CPU and GPU clusters use different numbers of executor threads per worker node. CPU clusters use multiple executor threads per node. GPU clusters use only one executor thread per node to avoid conflicts among multiple Spark tasks trying to use the same GPU. While this is generally optimal for libraries written for GPUs, it means that maximum parallelism is reduced on GPU clusters, so be aware of how many GPUs each trial can use when selecting GPU instance types. See GPU-enabled Clusters for details.
- Do not use
- SparkTrials
- on autoscaling clusters. Hyperopt selects the parallelism value when execution begins. If the cluster later autoscales, Hyperopt will not be able to take advantage of the new cluster size.