cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

For tuning hyperparameters with Apache Spark ML / MLlib, when should I use Spark ML's built-in tuning algorithms vs. Hyperopt?

Joseph_B
New Contributor III
New Contributor III

When should I use Spark ML's CrossValidator or TrainValidationSplit, vs. a separate tuning tool such as Hyperopt?

2 REPLIES 2

Joseph_B
New Contributor III
New Contributor III

Both are valid choices. By default, I'd recommend using Hyperopt nowadays. Here's the rationale, as pros & cons of each.

Spark ML's built-in tools

  • Pros: These fit the Spark ML Pipeline framework, so you can keep using the same type of APIs.
  • Cons: These are designed for brute force grid search. That's fine for a small number (say up to ~3) hyperparameters, but it becomes inefficient when you have many hyperparameters or when you want to test many combinations.

Hyperopt

  • Pros: This provides a more adaptive, iterative algorithm for tuning which can be more efficient in terms of the number of hyperparameter settings you need to try to reach a given accuracy. This is especially important when tuning many hyperparameters to testing many settings.
  • Cons: (See pros of Spark ML.)

Kaniz
Community Manager
Community Manager

Hi @Joseph Bradleyโ€‹ , Thanks for such an informative post!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.