cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

When doing hyperparameter tuning with Hyperopt, when should I use SparkTrials? Does it work with both single-machine ML (like sklearn) and distributed ML (like Apache Spark ML)?

Joseph_B
Databricks Employee
Databricks Employee

I want to know how to use Hyperopt in different situations:

  • Tuning a single-machine algorithm from scikit-learn or single-node TensorFlow
  • Tuning a distributed algorithm from Spark ML or distributed TensorFlow / Horovod
1 REPLY 1

Joseph_B
Databricks Employee
Databricks Employee

The right question to ask is indeed: Is the algorithm you want to tune single-machine or distributed?

If it's a single-machine algorithm like any from scikit-learn, then you can use SparkTrials with Hyperopt to distribute hyperparameter tuning.

If it's a distributed algorithm like any from Spark ML, then you should not use SparkTrials. You can run Hyperopt without a `trials` parameter (i.e., use the regular `Trials` type). That will run tuning on the cluster driver, leaving the full cluster available for each trial of your distributed algorithm.

You can find more info on these in the docs (AWS, Azure, GCP).

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group