For tuning hyperparameters with Apache Spark ML / ... - Databricks - 32744

Register to join the community

Machine Learning

When should I use Spark ML's CrossValidator or TrainValidationSplit, vs. a separate tuning tool such as Hyperopt?

2 REPLIES 2

Both are valid choices. By default, I'd recommend using Hyperopt nowadays. Here's the rationale, as pros & cons of each.

Spark ML's built-in tools

Pros: These fit the Spark ML Pipeline framework, so you can keep using the same type of APIs.
Cons: These are designed for brute force grid search. That's fine for a small number (say up to ~3) hyperparameters, but it becomes inefficient when you have many hyperparameters or when you want to test many combinations.

Hyperopt

Pros: This provides a more adaptive, iterative algorithm for tuning which can be more efficient in terms of the number of hyperparameter settings you need to try to reach a given accuracy. This is especially important when tuning many hyperparameters to testing many settings.
Cons: (See pros of Spark ML.)

Hi @Joseph Bradley , Thanks for such an informative post!

never-displayed

You must be signed in to add attachments

never-displayed

Announcements

Registration now open! Databricks Data + AI Summit 2024

Meet DBRX, the New Standard for High-Quality LLMs

Data Warehousing in the Era of AI