Databricks Community

appliable_ai · ‎06-11-2026

Hello,

I am following the "Get started: Build your first machine learning model on Databricks" tutorial, and am getting stuck on "Parallel training using Optuna".

When I Search runs to retrieve the best model, the following code fails as there are no models against the runs:

best_model_pyfunc = mlflow.pyfunc.load_model(
  'runs:/{run_id}/model'.format(
    run_id=best_run.run_id
  )
)

When I go to the runs in Experiments, each model against the run says "Failed", with no other insight I can seem to find anywhere.

Why is the following code (direct from the full unaltered notebook provided by the tutorial) not successfully logging models against each run?

And how/where can I find out the cause behind this? I attempted to add logging inside the objective, and nothing showed in the cell. I know the code is running as each run logs test_auc against it, just not the model

def objective(trial):
  # Enable autologging on each worker
  mlflow.sklearn.autolog()
  with mlflow.start_run(nested=True):
    params = {
      'n_estimators': trial.suggest_int('n_estimators', 20, 1000),
      'learning_rate': trial.suggest_float('learning_rate', 0.05, 1.0, log=True),
      'max_depth': trial.suggest_int('max_depth', 2, 5),
    }
    model_hp = sklearn.ensemble.GradientBoostingClassifier(
      random_state=0,
      **params
    )
    model_hp.fit(X_train, y_train)
    predicted_probs = model_hp.predict_proba(X_test)
    # Tune based on the test AUC
    # In production, you could use a separate validation set instead
    roc_auc = sklearn.metrics.roc_auc_score(y_test, predicted_probs[:,1])
    mlflow.log_metric('test_auc', roc_auc)

    # Negate the AUC because Optuna minimizes the objective by default
    return -roc_auc


with mlflow.start_run(run_name='gb_optuna') as run:
  # Use the MLflow Tracking Server as the Optuna storage backend
  experiment_id = mlflow.active_run().info.experiment_id
  mlflow_storage = MlflowStorage(experiment_id=experiment_id)

  # MlflowSparkStudy distributes the tuning using Spark workers
  mlflow_study = MlflowSparkStudy(
    study_name="gb-optuna-tuning",
    storage=mlflow_storage,
  )

  mlflow_study.optimize(objective, n_trials=32, n_jobs=4)

balajij8 · ‎06-13-2026

You can change the objective trial code to use optuna & follow the other steps in the tutorial & run the full code seamlessly.

Modify objective trial - use optuna in Free edition for serverless auth accommodation.

def objective(trial):
  # Enable autologging
  mlflow.sklearn.autolog()
  with mlflow.start_run(nested=True):
    params = {
      'n_estimators': trial.suggest_int('n_estimators', 20, 1000),
      'learning_rate': trial.suggest_float('learning_rate', 0.05, 1.0, log=True),
      'max_depth': trial.suggest_int('max_depth', 2, 5),
    }
    model_hp = sklearn.ensemble.GradientBoostingClassifier(
      random_state=0,
      **params
    )
    model_hp.fit(X_train, y_train)
    predicted_probs = model_hp.predict_proba(X_test)
    # Tune based on the test AUC
    # In production, you could use a separate validation set instead
    roc_auc = sklearn.metrics.roc_auc_score(y_test, predicted_probs[:,1])
    mlflow.log_metric('test_auc', roc_auc)

    # Negate the AUC because Optuna minimizes the objective by default
    return -roc_auc


with mlflow.start_run(run_name='gb_optuna') as run:
  # Use the MLflow Tracking Server as the Optuna storage backend
  experiment_id = mlflow.active_run().info.experiment_id
  mlflow_storage = MlflowStorage(experiment_id=experiment_id)

  # Create a Optuna study
  study = optuna.create_study(
    study_name="gb-optuna-tuning",
    storage=mlflow_storage,
    direction="minimize",
    load_if_exists=True
  )

  study.optimize(objective, n_trials=32, n_jobs=1)

You can also create a Databricks ML cluster in a workspace and run the Get Started tutorial (direct from the full unaltered notebook provided by the tutorial) in it

View solution in original post

balajij8 · ‎06-13-2026

You can change the objective trial code to use optuna & follow the other steps in the tutorial & run the full code seamlessly.

Modify objective trial - use optuna in Free edition for serverless auth accommodation.

def objective(trial):
  # Enable autologging
  mlflow.sklearn.autolog()
  with mlflow.start_run(nested=True):
    params = {
      'n_estimators': trial.suggest_int('n_estimators', 20, 1000),
      'learning_rate': trial.suggest_float('learning_rate', 0.05, 1.0, log=True),
      'max_depth': trial.suggest_int('max_depth', 2, 5),
    }
    model_hp = sklearn.ensemble.GradientBoostingClassifier(
      random_state=0,
      **params
    )
    model_hp.fit(X_train, y_train)
    predicted_probs = model_hp.predict_proba(X_test)
    # Tune based on the test AUC
    # In production, you could use a separate validation set instead
    roc_auc = sklearn.metrics.roc_auc_score(y_test, predicted_probs[:,1])
    mlflow.log_metric('test_auc', roc_auc)

    # Negate the AUC because Optuna minimizes the objective by default
    return -roc_auc


with mlflow.start_run(run_name='gb_optuna') as run:
  # Use the MLflow Tracking Server as the Optuna storage backend
  experiment_id = mlflow.active_run().info.experiment_id
  mlflow_storage = MlflowStorage(experiment_id=experiment_id)

  # Create a Optuna study
  study = optuna.create_study(
    study_name="gb-optuna-tuning",
    storage=mlflow_storage,
    direction="minimize",
    load_if_exists=True
  )

  study.optimize(objective, n_trials=32, n_jobs=1)

You can also create a Databricks ML cluster in a workspace and run the Get Started tutorial (direct from the full unaltered notebook provided by the tutorial) in it

Databricks Community

Models failing in tutorial

Databricks AMER Learning Festival | Virtual Training

Introducing the Genie Hub: Ask Questions, Share Builds, and Master Conversational Analytics

🌟 Community Pulse: Your Weekly Roundup! July 13 – 19, 2026

Solution Accelerator Series | Social Determinants of Health

Upcoming Community BrickTalk | Sports Analytics: Turning Tracking Data into Real-Time AI Decisions