yesterday
How do I get MLflow child runs to appear as children of their parent run in the MLflow GUI, if I'm choosing my own experiment location instead of letting everything be written to the default experiment location?
If I run the standard tutorial (https://docs.databricks.com/_extras/notebooks/source/mlflow/mlflow-end-to-end-example-uc.html) of running parameter tuning on an XGBoost model, with logging to MLflow, the individual runs are grouped together nicely in the MLflow UI under the default experiment location:
But there's trouble with the nesting if I take control of the name and location of the MLflow experiment. Say I set up an experiment location as follows:
EXPERIMENT_NAME = '/Users/dxxxx@realchemistry.com/MLflow_experiments/dxxxx_minimal_MLflow'
# Get the experiment ID if it exists, or create a new one
experiment_id = mlflow.get_experiment_by_name(EXPERIMENT_NAME)
if experiment_id is None:
# If the experiment does not exist, create it
experiment_id = mlflow.create_experiment(EXPERIMENT_NAME)
else:
# If the experiment exists, get its ID
experiment_id = experiment_id.experiment_id
If do a single model training run, using
with mlflow.start_run(experiment_id=experiment_id, run_name='untuned_random_forest'):
the model is archived with run name untuned_random_forest to a new experiment page dxxxx_minimal_MLflow exactly as I intend.
However, trouble turns up when I try a parameter optimization job with the runs to be nested. I set the experiment_id using
# Run fmin within an MLflow run context so that each hyperparameter configuration is logged as a child run of a parent
# run called "xgboost_models" .
with mlflow.start_run(experiment_id=experiment_id, run_name='xgboost_models_2') as parent_run:
run_id_value = parent_run.info.run_id
search_space['parent_run_id'] = run_id_value
best_params = fmin(
fn=train_model,
space=search_space,
algo=tpe.suggest,
max_evals=8,
trials=spark_trials,
)
which invokes the defined function train_model():
def train_model(params):
mlflow.xgboost.autolog()
with mlflow.start_run(nested=True):
train = xgb.DMatrix(data=X_train, label=y_train)
validation = xgb.DMatrix(data=X_val, label=y_val)
{et cetera}
the nesting (note nested=True) doesn't work, or at least doesn't appear to work. The bizarre outcome is that the my experiment page gets a new run called xgboost_models_2, but it doesn't have any children. And all the child runs are visible, but not on my experiment page -- they're only visible on the default experiment page, with no indication that they're children of anything. If you look inside the child runs, they each have a parent_run_id that seems right, but the GUI can't seem to figure out that it should group them under the parent run on my personal experiment page.
x
an hour ago
OK, here's more info about what's wrong, and a solution.
I used additional parameter logging to determine that no matter how I adjust the parameters of the inner call to
```
mlflow.start_run()
```
the `experiment_id` parameter of the child runs differs from that of the parent runs. It ignores `nested=True`, it ignores passing in a value of `experiment_id`, and it sets its own child `experiment_id` to a value corresponding to a new Experiment page named the same as the name of the notebook. Therefore, since parent and children have conflicting experiment_id values, they don't group together in the GUI.
That's pretty annoying.
However, the whole problem goes away if I set an `experiment_id` value in a global sense, back at the beginning. Specifically, in the block that sets and uses EXPERIMENT_NAME, add one more line of code at the end:
```
mlflow.set_experiment(experiment_id=experiment_id)
```
and then everything works exactly as it should. The child runs show up as nested under the parent run in my personal Experiment space.
8 hours ago
To ensure that MLflow child runs appear as children of their parent run in the MLflow GUI when using a custom experiment location, follow these steps:
Set Up the Experiment Location:
EXPERIMENT_NAME = '/Users/dxxxx@realchemistry.com/MLflow_experiments/dxxxx_minimal_MLflow'
# Get the experiment ID if it exists, or create a new one
experiment_id = mlflow.get_experiment_by_name(EXPERIMENT_NAME)
if experiment_id is None:
# If the experiment does not exist, create it
experiment_id = mlflow.create_experiment(EXPERIMENT_NAME)
else:
# If the experiment exists, get its ID
experiment_id = experiment_id.experiment_id
Start the Parent Run:
with mlflow.start_run(experiment_id=experiment_id, run_name='xgboost_models_2') as parent_run:
run_id_value = parent_run.info.run_id
search_space['parent_run_id'] = run_id_value
best_params = fmin(
fn=train_model,
space=search_space,
algo=tpe.suggest,
max_evals=8,
trials=spark_trials,
)
Define the Training Function with Nested Runs:
def train_model(params):
mlflow.xgboost.autolog()
with mlflow.start_run(nested=True):
train = xgb.DMatrix(data=X_train, label=y_train)
validation = xgb.DMatrix(data=X_val, label=y_val)
# Additional training code here
Ensure Correct Parent-Child Relationship:
parent_run_id
is correctly set in the search_space
.nested=True
parameter is used in the mlflow.start_run
call within the train_model
function.4 hours ago
Hi, thanks for your response. It doesn't seem to help at all, however. The solution you suggest is what I've already done (including once more just now, to make sure), and it achieves the same outcome I've already described:
Let me try to provide a little more detail in case it's helpful.
It looks somewhat like the `nested=True` parameter is doing a good job of getting the parent run ID assigned to the child run, but the GUI isn't honoring the parent-child relationship when it decides where to display the parent and child runs.
FOOTNOTE: You mention setting `parent_run_id` without saying what to use it for. Do you think there's a useful way to use it? I created it only as part of a later experiment, to try passing it as an optional argument to the inner `mlflow.start_run()` call, but it didn't seem to have any effect on the outcome.
3 hours ago
When creating child runs, explicitly set the parent run ID:
def train_model(params):
mlflow.xgboost.autolog()
with mlflow.start_run(nested=True, run_name="child_run", parent_run_id=parent_run.info.run_id):
# Your existing code here
3 hours ago
This has no new effect. Still unsuccessful at grouping the child runs under the parent.
(Which seems pretty reasonable, honestly, since as noted above, the Parent Run ID is already correctly tagged on the child runs.)
an hour ago
OK, here's more info about what's wrong, and a solution.
I used additional parameter logging to determine that no matter how I adjust the parameters of the inner call to
```
mlflow.start_run()
```
the `experiment_id` parameter of the child runs differs from that of the parent runs. It ignores `nested=True`, it ignores passing in a value of `experiment_id`, and it sets its own child `experiment_id` to a value corresponding to a new Experiment page named the same as the name of the notebook. Therefore, since parent and children have conflicting experiment_id values, they don't group together in the GUI.
That's pretty annoying.
However, the whole problem goes away if I set an `experiment_id` value in a global sense, back at the beginning. Specifically, in the block that sets and uses EXPERIMENT_NAME, add one more line of code at the end:
```
mlflow.set_experiment(experiment_id=experiment_id)
```
and then everything works exactly as it should. The child runs show up as nested under the parent run in my personal Experiment space.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group