When doing hyperparameter tuning with Hyperopt, when should I use SparkTrials? Does it work with both single-machine ML (like sklearn) and distributed ML (like Apache Spark ML)?

Joseph_B — Thu, 10 Jun 2021 00:51:24 GMT

I want to know how to use Hyperopt in different situations:

Tuning a single-machine algorithm from scikit-learn or single-node TensorFlow
Tuning a distributed algorithm from Spark ML or distributed TensorFlow / Horovod

Re: When doing hyperparameter tuning with Hyperopt, when should I use SparkTrials? Does it work with both single-machine ML (like sklearn) and distributed ML (like Apache Spark ML)?

Joseph_B — Thu, 10 Jun 2021 00:56:20 GMT

The right question to ask is indeed: Is the algorithm you want to tune single-machine or distributed?

If it's a single-machine algorithm like any from scikit-learn, then you can use SparkTrials with Hyperopt to distribute hyperparameter tuning.

If it's a distributed algorithm like any from Spark ML, then you should not use SparkTrials. You can run Hyperopt without a `trials` parameter (i.e., use the regular `Trials` type). That will run tuning on the cluster driver, leaving the full cluster available for each trial of your distributed algorithm.

You can find more info on these in the docs (AWS, Azure, GCP).

topic Re: When doing hyperparameter tuning with Hyperopt, when should I use SparkTrials? Does it work with both single-machine ML (like sklearn) and distributed ML (like Apache Spark ML)? in Data Engineering

When doing hyperparameter tuning with Hyperopt, when should I use SparkTrials? Does it work with both single-machine ML (like sklearn) and distributed ML (like Apache Spark ML)?

Re: When doing hyperparameter tuning with Hyperopt, when should I use SparkTrials? Does it work with both single-machine ML (like sklearn) and distributed ML (like Apache Spark ML)?