Databricks Community

dkxxx-rc · ‎01-07-2025

How do I get MLflow child runs to appear as children of their parent run in the MLflow GUI, if I'm choosing my own experiment location instead of letting everything be written to the default experiment location?

If I run the standard tutorial (https://docs.databricks.com/_extras/notebooks/source/mlflow/mlflow-end-to-end-example-uc.html) of running parameter tuning on an XGBoost model, with logging to MLflow, the individual runs are grouped together nicely in the MLflow UI under the default experiment location:

But there's trouble with the nesting if I take control of the name and location of the MLflow experiment. Say I set up an experiment location as follows:

EXPERIMENT_NAME = '/Users/dxxxx@realchemistry.com/MLflow_experiments/dxxxx_minimal_MLflow'

# Get the experiment ID if it exists, or create a new one
experiment_id = mlflow.get_experiment_by_name(EXPERIMENT_NAME)

if experiment_id is None:
    # If the experiment does not exist, create it
    experiment_id = mlflow.create_experiment(EXPERIMENT_NAME)
else:
    # If the experiment exists, get its ID
    experiment_id = experiment_id.experiment_id

If do a single model training run, using

with mlflow.start_run(experiment_id=experiment_id, run_name='untuned_random_forest'):

the model is archived with run name untuned_random_forest to a new experiment page dxxxx_minimal_MLflow exactly as I intend.

However, trouble turns up when I try a parameter optimization job with the runs to be nested. I set the experiment_id using

# Run fmin within an MLflow run context so that each hyperparameter configuration is logged as a child run of a parent
# run called "xgboost_models" .
with mlflow.start_run(experiment_id=experiment_id, run_name='xgboost_models_2') as parent_run:
  run_id_value = parent_run.info.run_id
  search_space['parent_run_id'] = run_id_value
  best_params = fmin(
    fn=train_model, 
    space=search_space, 
    algo=tpe.suggest, 
    max_evals=8,
    trials=spark_trials,
  )

which invokes the defined function train_model():

def train_model(params):
  mlflow.xgboost.autolog()
  with mlflow.start_run(nested=True):
    train = xgb.DMatrix(data=X_train, label=y_train)
    validation = xgb.DMatrix(data=X_val, label=y_val)
    {et cetera}

the nesting (note nested=True) doesn't work, or at least doesn't appear to work. The bizarre outcome is that the my experiment page gets a new run called xgboost_models_2, but it doesn't have any children. And all the child runs are visible, but not on my experiment page -- they're only visible on the default experiment page, with no indication that they're children of anything. If you look inside the child runs, they each have a parent_run_id that seems right, but the GUI can't seem to figure out that it should group them under the parent run on my personal experiment page.
x

dkxxx-rc · ‎01-08-2025

OK, here's more info about what's wrong, and a solution.

I used additional parameter logging to determine that no matter how I adjust the parameters of the inner call to
```
mlflow.start_run()
```

the `experiment_id` parameter of the child runs differs from that of the parent runs. It ignores `nested=True`, it ignores passing in a value of `experiment_id`, and it sets its own child `experiment_id` to a value corresponding to a new Experiment page named the same as the name of the notebook. Therefore, since parent and children have conflicting experiment_id values, they don't group together in the GUI.

That's pretty annoying.

However, the whole problem goes away if I set an `experiment_id` value in a global sense, back at the beginning. Specifically, in the block that sets and uses EXPERIMENT_NAME, add one more line of code at the end:
```
mlflow.set_experiment(experiment_id=experiment_id)
```
and then everything works exactly as it should. The child runs show up as nested under the parent run in my personal Experiment space.

View solution in original post

Walter_C · ‎01-08-2025

To ensure that MLflow child runs appear as children of their parent run in the MLflow GUI when using a custom experiment location, follow these steps:

Set Up the Experiment Location:

EXPERIMENT_NAME = '/Users/dxxxx@realchemistry.com/MLflow_experiments/dxxxx_minimal_MLflow'

# Get the experiment ID if it exists, or create a new one
experiment_id = mlflow.get_experiment_by_name(EXPERIMENT_NAME)

if experiment_id is None:
    # If the experiment does not exist, create it
    experiment_id = mlflow.create_experiment(EXPERIMENT_NAME)
else:
    # If the experiment exists, get its ID
    experiment_id = experiment_id.experiment_id

Start the Parent Run:

with mlflow.start_run(experiment_id=experiment_id, run_name='xgboost_models_2') as parent_run:
    run_id_value = parent_run.info.run_id
    search_space['parent_run_id'] = run_id_value
    best_params = fmin(
        fn=train_model, 
        space=search_space, 
        algo=tpe.suggest, 
        max_evals=8,
        trials=spark_trials,
    )

Define the Training Function with Nested Runs:

def train_model(params):
    mlflow.xgboost.autolog()
    with mlflow.start_run(nested=True):
        train = xgb.DMatrix(data=X_train, label=y_train)
        validation = xgb.DMatrix(data=X_val, label=y_val)
        # Additional training code here

Ensure Correct Parent-Child Relationship:
- Verify that the parent_run_id is correctly set in the search_space.
- Ensure that the nested=True parameter is used in the mlflow.start_run call within the train_model function.

dkxxx-rc · ‎01-08-2025

Hi, thanks for your response. It doesn't seem to help at all, however. The solution you suggest is what I've already done (including once more just now, to make sure), and it achieves the same outcome I've already described:

the parent run appears on my own experiment page with no children
the child runs appear on the default experiment page with no parents

Let me try to provide a little more detail in case it's helpful.

My latest parent run has Run ID = `5e0500d99c9d41069138d9e10fe7e83e`
Looking into one of the child runs, it has its own Run ID value and it has a field "Parent run" which points to the same parent run -- the value is a hyperlink to https://[redacted].cloud.databricks.com/ml/experiments/4161759641583557/runs/5e0500d99c9d41069138d9e... which points to that same parent Run ID.
And yet, the child runs still show up in the GUI only on the default Experiment page, not grouped with the Parent run (which is still living by itself on my Experiment page with no children).

It looks somewhat like the `nested=True` parameter is doing a good job of getting the parent run ID assigned to the child run, but the GUI isn't honoring the parent-child relationship when it decides where to display the parent and child runs.

FOOTNOTE: You mention setting `parent_run_id` without saying what to use it for. Do you think there's a useful way to use it? I created it only as part of a later experiment, to try passing it as an optional argument to the inner `mlflow.start_run()` call, but it didn't seem to have any effect on the outcome.

Walter_C · ‎01-08-2025

When creating child runs, explicitly set the parent run ID:

def train_model(params):
mlflow.xgboost.autolog()
with mlflow.start_run(nested=True, run_name="child_run", parent_run_id=parent_run.info.run_id):
# Your existing code here

dkxxx-rc · ‎01-08-2025

This has no new effect. Still unsuccessful at grouping the child runs under the parent.

(Which seems pretty reasonable, honestly, since as noted above, the Parent Run ID is already correctly tagged on the child runs.)

dkxxx-rc · ‎01-08-2025

OK, here's more info about what's wrong, and a solution.

I used additional parameter logging to determine that no matter how I adjust the parameters of the inner call to
```
mlflow.start_run()
```

the `experiment_id` parameter of the child runs differs from that of the parent runs. It ignores `nested=True`, it ignores passing in a value of `experiment_id`, and it sets its own child `experiment_id` to a value corresponding to a new Experiment page named the same as the name of the notebook. Therefore, since parent and children have conflicting experiment_id values, they don't group together in the GUI.

That's pretty annoying.

However, the whole problem goes away if I set an `experiment_id` value in a global sense, back at the beginning. Specifically, in the block that sets and uses EXPERIMENT_NAME, add one more line of code at the end:
```
mlflow.set_experiment(experiment_id=experiment_id)
```
and then everything works exactly as it should. The child runs show up as nested under the parent run in my personal Experiment space.

kirpi · 3 weeks ago

I had this same problem, and followed the same steps, but it did not work until I also explicitly set the experiment_id of the children.

EXPERIMENT_NAME = '/Users/my_user_name/my_experiment'
experiment_id = mlflow.get_experiment_by_name(EXPERIMENT_NAME)

if experiment_id is None:
    experiment_id = mlflow.create_experiment(EXPERIMENT_NAME)
else:
    experiment_id = experiment_id.experiment_id
mlflow.set_experiment(experiment_id=experiment_id)

#
#
#

def train_model(params):
    with mlflow.start_run(nested=True, parent_run_id=params['parent_run_id'], experiment_id=params['experiment_id']):
        # training and logging here
#
#
#
search_space = #dict of parameters
with mlflow.start_run(run_name=my_run_name) as parent_run:
    run_id_value = parent_run.info.run_id
    search_space['parent_run_id'] = run_id_value
    search_space['experiment_id'] = experiment_id
    best_params = fmin(fn=train_model, 
                       space=search_space, 
                       #etc
                       )

Databricks Community

Nested runs don't group correctly in MLflow

Photos

Join Us as a Local Community Builder!

Announcing the APJ Databricks Smart Business Insights Challenge: Empowering Data-Driven Decision Mak

🚀 Monthly Databricks Get Started Days – Accelerate Your Learning Journey! 🚀

Business Intelligence in the Era of AI

Virtual Learning Festival: 9 April - 30 April

Data + AI Summit 2025 — registration now open!