When doing hyperparameter tuning with Hyperopt, when should I use SparkTrials? Does it work with both single-machine ML (like sklearn) and distributed ML (like Apache Spark ML)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-09-2021 05:51 PM
I want to know how to use Hyperopt in different situations:
- Tuning a single-machine algorithm from scikit-learn or single-node TensorFlow
- Tuning a distributed algorithm from Spark ML or distributed TensorFlow / Horovod
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-09-2021 05:56 PM
The right question to ask is indeed: Is the algorithm you want to tune single-machine or distributed?
If it's a single-machine algorithm like any from scikit-learn, then you can use SparkTrials with Hyperopt to distribute hyperparameter tuning.
If it's a distributed algorithm like any from Spark ML, then you should not use SparkTrials. You can run Hyperopt without a `trials` parameter (i.e., use the regular `Trials` type). That will run tuning on the cluster driver, leaving the full cluster available for each trial of your distributed algorithm.
You can find more info on these in the docs (AWS, Azure, GCP).

