Proper mlflow run logging with SparkTrials and Hyperopt
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-10-2025 08:48 AM - edited 02-10-2025 08:49 AM
Hello!
I'm attempting to run a hyperparameter search using hyperopt and SparkTrials(), and log the resulting runs to an existing experiment (experiment A). I can see on this page that databricks suggests wrapping the `fmin()` call within a `mlflow.start_run()` statement, so I wrote this code in a notebook, which by default is associated with its own experiment (experiment B):
with mlflow.start_run(experiment_id=experiment_A):
trials = SparkTrials()
fmin(objective_fn,
trials=trials,
**other_params)
However, the result I get is a single parent run logged in experiment_A with all its child runs logged in experiment_B. This is obviously not desirable. Is there any way I can get both the parent and child runs from the trials logged in experiment A (i.e., not the experiment associated with the notebook being run)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-06-2025 03:54 AM
Both the parent and child runs of a Hyperopt sweep in Databricks are, by default, influenced by the experiment associated with the notebook context rather than the explicit experiment passed to mlflow.start_run(). As you noticed, child runs remain in the notebook’s experiment (experiment B), even when a parent run is created in your chosen experiment (experiment A) .
Why This Happens
-
The
experiment_idyou provide tomlflow.start_run()sets the parent run's experiment location. -
However, with
SparkTrials()(which parallelizes runs across separate Spark workers), each child run launched by workers uses their local notebook context, defaulting to that experiment's setting—typically experiment B.
Workarounds and Solutions
1. Set the Experiment Globally Before Running
You must call mlflow.set_experiment() (or mlflow.set_experiment_by_name()) BEFORE invoking fmin(). This changes the default experiment for the entire context, ensuring both parent and child runs go to the same experiment:
mlflow.set_experiment("/Users/your-user/experiment_A_name")
with mlflow.start_run():
trials = SparkTrials()
fmin(objective_fn, trials=trials, **other_params)
-
Do not use the
experiment_idargument instart_run()if also usingset_experiment(); set_experiment takes precedence for all subsequent runs until you change it again .
2. If Using Experiment ID Explicitly
If you want to use IDs:
mlflow.set_experiment(experiment_id=experiment_A_id)
with mlflow.start_run():
# as above
The key is to invoke set_experiment() before your fmin() call—not just within the start_run() context—so the worker processes pick up the correct experiment.
3. Manual Consolidation (Not Preferred)
If the above isn’t possible due to distributed nuances or codebase constraints, you can:
-
Use MLflow APIs to move/copy runs between experiments after the fact. This is more work and not recommended unless necessary .
Important Tips
-
Always set the experiment at the notebook (or job) level before starting your trials.
-
The parent run’s context does not “trickle down” to the child runs automatically with SparkTrials.
-
If using Databricks jobs, ensure the MLflow experiment context is set at the job level, too.
Summary:
To log all runs to experiment A, call mlflow.set_experiment() for experiment A at the start of your notebook or script, before any runs are started or before calling fmin(). Do not rely on the parent run context alone to set experiment affiliation for children when using SparkTrials .