Proper mlflow run logging with SparkTrials and Hyp...

rtreves · ‎02-10-2025

Hello!

I'm attempting to run a hyperparameter search using hyperopt and SparkTrials(), and log the resulting runs to an existing experiment (experiment A). I can see on this page that databricks suggests wrapping the `fmin()` call within a `mlflow.start_run()` statement, so I wrote this code in a notebook, which by default is associated with its own experiment (experiment B):

with mlflow.start_run(experiment_id=experiment_A):
    trials = SparkTrials()
    fmin(objective_fn,
        trials=trials,
        **other_params)

However, the result I get is a single parent run logged in experiment_A with all its child runs logged in experiment_B. This is obviously not desirable. Is there any way I can get both the parent and child runs from the trials logged in experiment A (i.e., not the experiment associated with the notebook being run)?

mark_ott · ‎11-06-2025

Both the parent and child runs of a Hyperopt sweep in Databricks are, by default, influenced by the experiment associated with the notebook context rather than the explicit experiment passed to mlflow.start_run(). As you noticed, child runs remain in the notebook’s experiment (experiment B), even when a parent run is created in your chosen experiment (experiment A) .

Why This Happens

The experiment_id you provide to mlflow.start_run() sets the parent run's experiment location.
However, with SparkTrials() (which parallelizes runs across separate Spark workers), each child run launched by workers uses their local notebook context, defaulting to that experiment's setting—typically experiment B.

Workarounds and Solutions

1. Set the Experiment Globally Before Running

You must call mlflow.set_experiment() (or mlflow.set_experiment_by_name()) BEFORE invoking fmin(). This changes the default experiment for the entire context, ensuring both parent and child runs go to the same experiment:

python

mlflow.set_experiment("/Users/your-user/experiment_A_name")
with mlflow.start_run():
    trials = SparkTrials()
    fmin(objective_fn, trials=trials, **other_params)

Do not use the experiment_id argument in start_run() if also using set_experiment(); set_experiment takes precedence for all subsequent runs until you change it again .

2. If Using Experiment ID Explicitly

If you want to use IDs:

python

mlflow.set_experiment(experiment_id=experiment_A_id)
with mlflow.start_run():
    # as above

The key is to invoke set_experiment() before your fmin() call—not just within the start_run() context—so the worker processes pick up the correct experiment.

3. Manual Consolidation (Not Preferred)

If the above isn’t possible due to distributed nuances or codebase constraints, you can:

Use MLflow APIs to move/copy runs between experiments after the fact. This is more work and not recommended unless necessary .

Important Tips

Always set the experiment at the notebook (or job) level before starting your trials.
The parent run’s context does not “trickle down” to the child runs automatically with SparkTrials.
If using Databricks jobs, ensure the MLflow experiment context is set at the job level, too.

Summary:
To log all runs to experiment A, call mlflow.set_experiment() for experiment A at the start of your notebook or script, before any runs are started or before calling fmin(). Do not rely on the parent run context alone to set experiment affiliation for children when using SparkTrials .