Hello,
I am following the "Get started: Build your first machine learning model on Databricks" tutorial, and am getting stuck on "Parallel training using Optuna".
When I Search runs to retrieve the best model, the following code fails as there are no models against the runs:
best_model_pyfunc = mlflow.pyfunc.load_model(
'runs:/{run_id}/model'.format(
run_id=best_run.run_id
)
)
When I go to the runs in Experiments, each model against the run says "Failed", with no other insight I can seem to find anywhere.
Why is the following code (direct from the full unaltered notebook provided by the tutorial) not successfully logging models against each run?
And how/where can I find out the cause behind this? I attempted to add logging inside the objective, and nothing showed in the cell. I know the code is running as each run logs test_auc against it, just not the model
def objective(trial):
# Enable autologging on each worker
mlflow.sklearn.autolog()
with mlflow.start_run(nested=True):
params = {
'n_estimators': trial.suggest_int('n_estimators', 20, 1000),
'learning_rate': trial.suggest_float('learning_rate', 0.05, 1.0, log=True),
'max_depth': trial.suggest_int('max_depth', 2, 5),
}
model_hp = sklearn.ensemble.GradientBoostingClassifier(
random_state=0,
**params
)
model_hp.fit(X_train, y_train)
predicted_probs = model_hp.predict_proba(X_test)
# Tune based on the test AUC
# In production, you could use a separate validation set instead
roc_auc = sklearn.metrics.roc_auc_score(y_test, predicted_probs[:,1])
mlflow.log_metric('test_auc', roc_auc)
# Negate the AUC because Optuna minimizes the objective by default
return -roc_auc
with mlflow.start_run(run_name='gb_optuna') as run:
# Use the MLflow Tracking Server as the Optuna storage backend
experiment_id = mlflow.active_run().info.experiment_id
mlflow_storage = MlflowStorage(experiment_id=experiment_id)
# MlflowSparkStudy distributes the tuning using Spark workers
mlflow_study = MlflowSparkStudy(
study_name="gb-optuna-tuning",
storage=mlflow_storage,
)
mlflow_study.optimize(objective, n_trials=32, n_jobs=4)