<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Nested runs don't group correctly in MLflow in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/nested-runs-don-t-group-correctly-in-mlflow/m-p/114134#M4010</link>
    <description>&lt;P&gt;I had this same problem, and followed the same steps, but it did not work until I also explicitly set the experiment_id of the children.&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;&lt;LI-CODE lang="python"&gt;EXPERIMENT_NAME = '/Users/my_user_name/my_experiment'
experiment_id = mlflow.get_experiment_by_name(EXPERIMENT_NAME)

if experiment_id is None:
    experiment_id = mlflow.create_experiment(EXPERIMENT_NAME)
else:
    experiment_id = experiment_id.experiment_id
mlflow.set_experiment(experiment_id=experiment_id)

#
#
#

def train_model(params):
    with mlflow.start_run(nested=True, parent_run_id=params['parent_run_id'], experiment_id=params['experiment_id']):
        # training and logging here
#
#
#
search_space = #dict of parameters
with mlflow.start_run(run_name=my_run_name) as parent_run:
    run_id_value = parent_run.info.run_id
    search_space['parent_run_id'] = run_id_value
    search_space['experiment_id'] = experiment_id
    best_params = fmin(fn=train_model, 
                       space=search_space, 
                       #etc
                       )

    &lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 01 Apr 2025 03:58:20 GMT</pubDate>
    <dc:creator>kirpi</dc:creator>
    <dc:date>2025-04-01T03:58:20Z</dc:date>
    <item>
      <title>Nested runs don't group correctly in MLflow</title>
      <link>https://community.databricks.com/t5/machine-learning/nested-runs-don-t-group-correctly-in-mlflow/m-p/104616#M3886</link>
      <description>&lt;P&gt;How do I get MLflow child runs to appear as children of their parent run in the MLflow GUI, if I'm choosing my own experiment location instead of letting everything be written to the default experiment location?&lt;/P&gt;&lt;P&gt;If I run the standard tutorial (&lt;A href="https://docs.databricks.com/_extras/notebooks/source/mlflow/mlflow-end-to-end-example-uc.html" target="_blank"&gt;https://docs.databricks.com/_extras/notebooks/source/mlflow/mlflow-end-to-end-example-uc.html&lt;/A&gt;) of running parameter tuning on an XGBoost model, with logging to MLflow, the individual runs are grouped together nicely in the MLflow UI under the default experiment location:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="dkxxxrc_0-1736289524445.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/13932i2D0CB1E65D906570/image-size/medium?v=v2&amp;amp;px=400" role="button" title="dkxxxrc_0-1736289524445.png" alt="dkxxxrc_0-1736289524445.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;But there's trouble with the nesting if I take control of the name and location of the MLflow experiment.&amp;nbsp; Say I set up an experiment location as follows:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;EXPERIMENT_NAME = '/Users/dxxxx@realchemistry.com/MLflow_experiments/dxxxx_minimal_MLflow'

# Get the experiment ID if it exists, or create a new one
experiment_id = mlflow.get_experiment_by_name(EXPERIMENT_NAME)

if experiment_id is None:
    # If the experiment does not exist, create it
    experiment_id = mlflow.create_experiment(EXPERIMENT_NAME)
else:
    # If the experiment exists, get its ID
    experiment_id = experiment_id.experiment_id&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If do a single model training run, using&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;with mlflow.start_run(experiment_id=experiment_id, run_name='untuned_random_forest'):&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;the model is archived with run name &lt;EM&gt;untuned_random_forest&lt;/EM&gt; to a new experiment page &lt;STRONG&gt;dxxxx_minimal_MLflow&lt;/STRONG&gt; exactly as I intend.&lt;BR /&gt;&lt;BR /&gt;However, trouble turns up when I try a parameter optimization job with the runs to be nested.&amp;nbsp; I set the experiment_id using&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;# Run fmin within an MLflow run context so that each hyperparameter configuration is logged as a child run of a parent
# run called "xgboost_models" .
with mlflow.start_run(experiment_id=experiment_id, run_name='xgboost_models_2') as parent_run:
  run_id_value = parent_run.info.run_id
  search_space['parent_run_id'] = run_id_value
  best_params = fmin(
    fn=train_model, 
    space=search_space, 
    algo=tpe.suggest, 
    max_evals=8,
    trials=spark_trials,
  )&lt;/LI-CODE&gt;&lt;P&gt;which invokes the defined function train_model():&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;def train_model(params):
  mlflow.xgboost.autolog()
  with mlflow.start_run(nested=True):
    train = xgb.DMatrix(data=X_train, label=y_train)
    validation = xgb.DMatrix(data=X_val, label=y_val)
    {et cetera}&lt;/LI-CODE&gt;&lt;P&gt;the nesting (note &lt;STRONG&gt;nested=True&lt;/STRONG&gt;) doesn't work, or at least doesn't appear to work.&amp;nbsp; The bizarre outcome is that the my experiment page gets a new run called &lt;STRONG&gt;xgboost_models_2&lt;/STRONG&gt;, but it doesn't have any children.&amp;nbsp; And all the child runs are visible, but not on my experiment page -- they're only visible on the default experiment page, with no indication that they're children of anything.&amp;nbsp; If you look inside the child runs, they each have a parent_run_id that seems right, but the GUI can't seem to figure out that it should group them under the parent run on my personal experiment page.&lt;BR /&gt;x&lt;/P&gt;</description>
      <pubDate>Tue, 07 Jan 2025 22:53:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/nested-runs-don-t-group-correctly-in-mlflow/m-p/104616#M3886</guid>
      <dc:creator>dkxxx-rc</dc:creator>
      <dc:date>2025-01-07T22:53:52Z</dc:date>
    </item>
    <item>
      <title>Re: Nested runs don't group correctly in MLflow</title>
      <link>https://community.databricks.com/t5/machine-learning/nested-runs-don-t-group-correctly-in-mlflow/m-p/104692#M3887</link>
      <description>&lt;P class="_1t7bu9h1 paragraph"&gt;To ensure that MLflow child runs appear as children of their parent run in the MLflow GUI when using a custom experiment location, follow these steps:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Set Up the Experiment Location:&lt;/STRONG&gt;&lt;/P&gt;
&lt;DIV class="gb5fhw2"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-python _1t7bu9hb hljs language-python gb5fhw3"&gt;EXPERIMENT_NAME = &lt;SPAN class="hljs-string"&gt;'/Users/dxxxx@realchemistry.com/MLflow_experiments/dxxxx_minimal_MLflow'&lt;/SPAN&gt;

&lt;SPAN class="hljs-comment"&gt;# Get the experiment ID if it exists, or create a new one&lt;/SPAN&gt;
experiment_id = mlflow.get_experiment_by_name(EXPERIMENT_NAME)

&lt;SPAN class="hljs-keyword"&gt;if&lt;/SPAN&gt; experiment_id &lt;SPAN class="hljs-keyword"&gt;is&lt;/SPAN&gt; &lt;SPAN class="hljs-literal"&gt;None&lt;/SPAN&gt;:
    &lt;SPAN class="hljs-comment"&gt;# If the experiment does not exist, create it&lt;/SPAN&gt;
    experiment_id = mlflow.create_experiment(EXPERIMENT_NAME)
&lt;SPAN class="hljs-keyword"&gt;else&lt;/SPAN&gt;:
    &lt;SPAN class="hljs-comment"&gt;# If the experiment exists, get its ID&lt;/SPAN&gt;
    experiment_id = experiment_id.experiment_id&lt;/CODE&gt;&lt;/PRE&gt;
&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Start the Parent Run:&lt;/STRONG&gt;&lt;/P&gt;
&lt;DIV class="gb5fhw2"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-python _1t7bu9hb hljs language-python gb5fhw3"&gt;&lt;SPAN class="hljs-keyword"&gt;with&lt;/SPAN&gt; mlflow.start_run(experiment_id=experiment_id, run_name=&lt;SPAN class="hljs-string"&gt;'xgboost_models_2'&lt;/SPAN&gt;) &lt;SPAN class="hljs-keyword"&gt;as&lt;/SPAN&gt; parent_run:
    run_id_value = parent_run.info.run_id
    search_space[&lt;SPAN class="hljs-string"&gt;'parent_run_id'&lt;/SPAN&gt;] = run_id_value
    best_params = fmin(
        fn=train_model, 
        space=search_space, 
        algo=tpe.suggest, 
        max_evals=&lt;SPAN class="hljs-number"&gt;8&lt;/SPAN&gt;,
        trials=spark_trials,
    )&lt;/CODE&gt;&lt;/PRE&gt;
&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Define the Training Function with Nested Runs:&lt;/STRONG&gt;&lt;/P&gt;
&lt;DIV class="gb5fhw2"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-python _1t7bu9hb hljs language-python gb5fhw3"&gt;&lt;SPAN class="hljs-keyword"&gt;def&lt;/SPAN&gt; &lt;SPAN class="hljs-title function_"&gt;train_model&lt;/SPAN&gt;(&lt;SPAN class="hljs-params"&gt;params&lt;/SPAN&gt;):
    mlflow.xgboost.autolog()
    &lt;SPAN class="hljs-keyword"&gt;with&lt;/SPAN&gt; mlflow.start_run(nested=&lt;SPAN class="hljs-literal"&gt;True&lt;/SPAN&gt;):
        train = xgb.DMatrix(data=X_train, label=y_train)
        validation = xgb.DMatrix(data=X_val, label=y_val)
        &lt;SPAN class="hljs-comment"&gt;# Additional training code here&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Ensure Correct Parent-Child Relationship:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL class="_1t7bu9h7 _1t7bu9h2"&gt;
&lt;LI&gt;Verify that the &lt;CODE&gt;parent_run_id&lt;/CODE&gt; is correctly set in the &lt;CODE&gt;search_space&lt;/CODE&gt;.&lt;/LI&gt;
&lt;LI&gt;Ensure that the &lt;CODE&gt;nested=True&lt;/CODE&gt; parameter is used in the &lt;CODE&gt;mlflow.start_run&lt;/CODE&gt; call within the &lt;CODE&gt;train_model&lt;/CODE&gt; function.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/OL&gt;</description>
      <pubDate>Wed, 08 Jan 2025 12:20:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/nested-runs-don-t-group-correctly-in-mlflow/m-p/104692#M3887</guid>
      <dc:creator>Walter_C</dc:creator>
      <dc:date>2025-01-08T12:20:45Z</dc:date>
    </item>
    <item>
      <title>Re: Nested runs don't group correctly in MLflow</title>
      <link>https://community.databricks.com/t5/machine-learning/nested-runs-don-t-group-correctly-in-mlflow/m-p/104735#M3888</link>
      <description>&lt;P&gt;Hi, thanks for your response.&amp;nbsp; It doesn't seem to help at all, however.&amp;nbsp; The solution you suggest is what I've already done (including once more just now, to make sure), and it achieves the same outcome I've already described:&amp;nbsp;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;the parent run appears on my own experiment page with no children&lt;/LI&gt;&lt;LI&gt;the child runs appear on the default experiment page with no parents&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Let me try to provide a little more detail in case it's helpful.&amp;nbsp;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;My latest parent run has Run ID = `&lt;SPAN&gt;5e0500d99c9d41069138d9e10fe7e83e`&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;Looking into one of the child runs, it has its own Run ID value and it has a field "Parent run" which points to the same parent run -- the value is a hyperlink to &lt;A href="https://[redacted].cloud.databricks.com/ml/experiments/4161759641583557/runs/5e0500d99c9d41069138d9e10fe7e83e," target="_blank"&gt;https://[redacted].cloud.databricks.com/ml/experiments/4161759641583557/runs/5e0500d99c9d41069138d9e10fe7e83e,&lt;/A&gt;&amp;nbsp;which points to that same parent Run ID.&lt;/LI&gt;&lt;LI&gt;And yet, the child runs still show up in the GUI only on the default Experiment page, not grouped with the Parent run (which is still living by itself on my Experiment page with no children).&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;It looks somewhat like the `nested=True` parameter is doing a good job of getting the parent run ID assigned to the child run, but the GUI isn't honoring the parent-child relationship when it decides where to display the parent and child runs.&lt;/P&gt;&lt;P&gt;FOOTNOTE:&amp;nbsp; You mention setting `&lt;SPAN&gt;parent_run_id` without saying what to use it for.&amp;nbsp; Do you think there's a useful way to use it?&amp;nbsp; I created it only as part of a later experiment, to try passing it as an optional argument to the inner `mlflow.start_run()` call, but it didn't seem to have any effect on the outcome.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Jan 2025 16:18:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/nested-runs-don-t-group-correctly-in-mlflow/m-p/104735#M3888</guid>
      <dc:creator>dkxxx-rc</dc:creator>
      <dc:date>2025-01-08T16:18:26Z</dc:date>
    </item>
    <item>
      <title>Re: Nested runs don't group correctly in MLflow</title>
      <link>https://community.databricks.com/t5/machine-learning/nested-runs-don-t-group-correctly-in-mlflow/m-p/104741#M3889</link>
      <description>&lt;P&gt;&lt;SPAN&gt;When creating child runs, explicitly set the parent run ID:&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;def train_model(params):
mlflow.xgboost.autolog()
with mlflow.start_run(nested=True, run_name="child_run", parent_run_id=parent_run.info.run_id):
# Your existing code here&lt;/LI-CODE&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Jan 2025 16:47:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/nested-runs-don-t-group-correctly-in-mlflow/m-p/104741#M3889</guid>
      <dc:creator>Walter_C</dc:creator>
      <dc:date>2025-01-08T16:47:03Z</dc:date>
    </item>
    <item>
      <title>Re: Nested runs don't group correctly in MLflow</title>
      <link>https://community.databricks.com/t5/machine-learning/nested-runs-don-t-group-correctly-in-mlflow/m-p/104750#M3890</link>
      <description>&lt;P&gt;This has no new effect.&amp;nbsp; Still unsuccessful at grouping the child runs under the parent.&amp;nbsp;&lt;/P&gt;&lt;P&gt;(Which seems pretty reasonable, honestly, since as noted above, the Parent Run ID is already correctly tagged on the child runs.)&lt;/P&gt;</description>
      <pubDate>Wed, 08 Jan 2025 17:23:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/nested-runs-don-t-group-correctly-in-mlflow/m-p/104750#M3890</guid>
      <dc:creator>dkxxx-rc</dc:creator>
      <dc:date>2025-01-08T17:23:00Z</dc:date>
    </item>
    <item>
      <title>Re: Nested runs don't group correctly in MLflow</title>
      <link>https://community.databricks.com/t5/machine-learning/nested-runs-don-t-group-correctly-in-mlflow/m-p/104763#M3891</link>
      <description>&lt;P&gt;OK, here's more info about what's wrong, and a solution.&lt;/P&gt;&lt;P&gt;I used additional parameter logging to determine that no matter how I adjust the parameters of the inner call to&amp;nbsp;&lt;BR /&gt;```&lt;BR /&gt;mlflow.start_run()&lt;BR /&gt;```&lt;/P&gt;&lt;P&gt;the `experiment_id` parameter of the child runs differs from that of the parent runs.&amp;nbsp; It ignores `nested=True`, it ignores passing in a value of `experiment_id`, and it sets its own child `experiment_id` to a value corresponding to a new Experiment page named the same as the name of the notebook.&amp;nbsp; Therefore, since parent and children have conflicting experiment_id values, they don't group together in the GUI.&lt;/P&gt;&lt;P&gt;That's pretty annoying.&lt;/P&gt;&lt;P&gt;However, the whole problem goes away if I set an `experiment_id` value in a global sense, back at the beginning.&amp;nbsp; Specifically, in the block that sets and uses&amp;nbsp;EXPERIMENT_NAME, add one more line of code at the end:&lt;BR /&gt;```&lt;BR /&gt;mlflow.set_experiment(experiment_id=experiment_id)&lt;BR /&gt;```&lt;BR /&gt;and then everything works exactly as it should.&amp;nbsp; The child runs show up as nested under the parent run in my personal Experiment space.&lt;/P&gt;</description>
      <pubDate>Wed, 08 Jan 2025 18:41:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/nested-runs-don-t-group-correctly-in-mlflow/m-p/104763#M3891</guid>
      <dc:creator>dkxxx-rc</dc:creator>
      <dc:date>2025-01-08T18:41:26Z</dc:date>
    </item>
    <item>
      <title>Re: Nested runs don't group correctly in MLflow</title>
      <link>https://community.databricks.com/t5/machine-learning/nested-runs-don-t-group-correctly-in-mlflow/m-p/114134#M4010</link>
      <description>&lt;P&gt;I had this same problem, and followed the same steps, but it did not work until I also explicitly set the experiment_id of the children.&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;&lt;LI-CODE lang="python"&gt;EXPERIMENT_NAME = '/Users/my_user_name/my_experiment'
experiment_id = mlflow.get_experiment_by_name(EXPERIMENT_NAME)

if experiment_id is None:
    experiment_id = mlflow.create_experiment(EXPERIMENT_NAME)
else:
    experiment_id = experiment_id.experiment_id
mlflow.set_experiment(experiment_id=experiment_id)

#
#
#

def train_model(params):
    with mlflow.start_run(nested=True, parent_run_id=params['parent_run_id'], experiment_id=params['experiment_id']):
        # training and logging here
#
#
#
search_space = #dict of parameters
with mlflow.start_run(run_name=my_run_name) as parent_run:
    run_id_value = parent_run.info.run_id
    search_space['parent_run_id'] = run_id_value
    search_space['experiment_id'] = experiment_id
    best_params = fmin(fn=train_model, 
                       space=search_space, 
                       #etc
                       )

    &lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 01 Apr 2025 03:58:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/nested-runs-don-t-group-correctly-in-mlflow/m-p/114134#M4010</guid>
      <dc:creator>kirpi</dc:creator>
      <dc:date>2025-04-01T03:58:20Z</dc:date>
    </item>
  </channel>
</rss>

