cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Hyperopt (15.4 LTS ML) ignores autologger settings

art1
New Contributor III

I use ML Flow Experiment to store models once they leave very early tests and development. I switched lately to 15.4 LTS ML and was hit by unhinged Hyperopt behavior:

  1. it was creating Experiment logs ignoring i) autologger is off on the workspace level in notebooks, and ii) explicit request with mlflow.autolog(disable=True) within the notebook.
  2. The created Experiment can not be deleted from my Experiments (all options are dimmed)

I want back my control of processes.

1 REPLY 1

Louis_Frolio
Databricks Employee
Databricks Employee

Hey @art1 , sorry this post got lost in the shuffle.  Here are some things to consider regarding your question:

 

Thanks for flagging this—what you’re seeing is expected given how Databricks integrates Hyperopt with MLflow, and there are clear ways to get your control back.
 

What’s causing the unexpected logging

  • Hyperopt on Databricks uses its own automated MLflow tracking integration, which is independent of the standard mlflow.autolog() feature. That’s why runs are still logged even if autologging is disabled at the workspace level or via mlflow.autolog(disable=True) in your notebook.
  • Hyperopt is deprecated and scheduled for removal in the next major Databricks ML runtime, so steering toward Optuna or Ray Tune is recommended if you want tighter control over logging behavior going forward.

How to stop Hyperopt from auto-logging Pick one of these approaches:

  • Don’t use SparkTrials; use the default Trials class. Databricks’ automated MLflow logging for Hyperopt is tied to SparkTrials. When you run distributed or single-node trials with the default Trials, Databricks does not auto-log to MLflow and you control what gets logged manually.
    from hyperopt import fmin, tpe, hp, Trials
    
    def objective(params):
        # your training + return a scalar loss
        return loss
    
    trials = Trials()  # default Trials -> no Databricks auto-logging
    best = fmin(fn=objective, space=search_space, algo=tpe.suggest, max_evals=50, trials=trials)
    # If you want logging, do it explicitly:
    # with mlflow.start_run():
    #     mlflow.log_params(best); mlflow.log_metric("final_loss", final_loss)
    Databricks documents that with distributed training algorithms, use Trials (not SparkTrials) and manually call MLflow if you want logging.
  • Switch to Optuna or Ray Tune. Both integrate cleanly with MLflow and let you opt-in to logging via callbacks or explicit API calls, so nothing is logged unless you choose to.
  • If you must keep SparkTrials, redirect where it logs. You can set the active experiment (to a workspace experiment you own) or change the tracking URI to a path you control. This doesn’t disable logging, but it keeps it confined to the experiment you choose.
    import mlflow
    mlflow.set_tracking_uri("databricks")  # default
    mlflow.set_experiment("/Users/you@databricks.com/controlled-experiment")  # a workspace experiment you created
 

Why you can’t delete the “experiment” in the UI

  • Notebook experiments are special. When MLflow runs start without an active experiment, Databricks automatically creates a “notebook experiment” attached to the notebook. These cannot be renamed or deleted from the MLflow UI; the controls appear disabled because they’re bound to the notebook’s lifecycle.
  • Deleting a notebook experiment via the API moves the notebook to Trash. If you use MlflowClient().delete_experiment(experiment_id) on a notebook experiment, Databricks will move the notebook itself to the Trash folder. That’s by design.
  • Experiments created from notebooks in Git folders have further limitations. You can’t directly manage rename/delete/permissions on those experiments; you must operate at the Git folder level.

Regain control—practical steps

  • Set an explicit workspace experiment at the top of your notebook. That ensures runs never go to the auto-created notebook experiment and gives you full control of lifecycle in the UI.
  import mlflow
  mlflow.set_tracking_uri("databricks")
  mlflow.set_experiment("/Users/you@databricks.com/my-experiment")
  
  • Avoid SparkTrials to stop Hyperopt auto-logging and wrap your training explicitly in mlflow.start_run() only where you want logs.
  • Consider Optuna or Ray Tune for HPO in 15.4 LTS ML and beyond; logging is opt-in and Hyperopt is being removed in the next major ML runtime.
  • If you need to remove the current notebook experiment: either
    • Leave it and start logging to a workspace experiment as above, or
    • Use the MLflow API to delete it (knowing the notebook will go to Trash).

Notes on workspace autologging

  • Disabling Databricks Autologging affects framework autologgers (sklearn, PyTorch, XGBoost, etc.), but it does not affect the separate Hyperopt automated tracking integration, which is why it didn’t solve the issue.
 
Hope this helps, Louis.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now