For tuning hyperparameters with Apache Spark ML / MLlib, when should I use Spark ML's built-in tuning algorithms vs. Hyperopt?
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-20-2021 08:43 AM
When should I use Spark ML's CrossValidator or TrainValidationSplit, vs. a separate tuning tool such as Hyperopt?
Labels:
1 REPLY 1
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-20-2021 08:51 AM
Both are valid choices. By default, I'd recommend using Hyperopt nowadays. Here's the rationale, as pros & cons of each.
Spark ML's built-in tools
- Pros: These fit the Spark ML Pipeline framework, so you can keep using the same type of APIs.
- Cons: These are designed for brute force grid search. That's fine for a small number (say up to ~3) hyperparameters, but it becomes inefficient when you have many hyperparameters or when you want to test many combinations.
Hyperopt
- Pros: This provides a more adaptive, iterative algorithm for tuning which can be more efficient in terms of the number of hyperparameter settings you need to try to reach a given accuracy. This is especially important when tuning many hyperparameters to testing many settings.
- Cons: (See pros of Spark ML.)

