cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

When doing hyperparameter tuning with Hyperopt, when should I use SparkTrials? Does it work with both single-machine ML (like sklearn) and distributed ML (like Apache Spark ML)?

Joseph_B
New Contributor III
New Contributor III

I want to know how to use Hyperopt in different situations:

  • Tuning a single-machine algorithm from scikit-learn or single-node TensorFlow
  • Tuning a distributed algorithm from Spark ML or distributed TensorFlow / Horovod
1 REPLY 1

Joseph_B
New Contributor III
New Contributor III

The right question to ask is indeed: Is the algorithm you want to tune single-machine or distributed?

If it's a single-machine algorithm like any from scikit-learn, then you can use SparkTrials with Hyperopt to distribute hyperparameter tuning.

If it's a distributed algorithm like any from Spark ML, then you should not use SparkTrials. You can run Hyperopt without a `trials` parameter (i.e., use the regular `Trials` type). That will run tuning on the cluster driver, leaving the full cluster available for each trial of your distributed algorithm.

You can find more info on these in the docs (AWS, Azure, GCP).

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!