How to proper use Databricks MLFlow Managed tracker/register with Databricks Workflow

GGG_P
New Contributor III

Hey.

I'm building a DevOps/MLOps pipeline to train/register simple scikit learn model.

I created a simple Databricks Workflow to execute training and register task on specific .git branch. (Workflow is setup with Databricks Repo on specifc branch, with Notebook as input).

FYI : Everything is working fine when I do run my notebook as standalone notebook, in my Workspace

During Databricks Workflow execution, I realize that I need to define my own 'experiment_name' (see error)

2022/12/08 04:36:32 WARNING mlflow.tracking.default_experiment.registry: Encountered unexpected error while getting experiment_id: FEATURE_DISABLED: Creation of experiments in jobs is not enabled. If using the Python fluent API, you can set an active experiment under which to create runs by calling mlflow.set_experiment("experiment_name") at the start of your program.
2022/12/08 04:36:32 WARNING mlflow.tracking.default_experiment.registry: Encountered unexpected error while getting experiment_id: None has type NoneType, but expected one of: bytes, unicode

So I did define set_tracking_uri with specific customer folder.

I did also create my experiment.

mlflow.set_tracking_uri("/my_custom_folder/")
run_id = mlflow.create_experiment("my_exp_from_databricks")

MLFlow is able to log everything ... BUT it's not managed by Databricks MLFlow anymore ...

I can't see anything from Databricks MLFlow UI.

I guess that my tracking_uri is wrong, but I have no idea what to set to be able to see it in Databricks MLFLow UI.

My question is simple, is it possible to run/log/register model using Databricks Managed MLFlow from Databricks Workflow ?

Thank you.

GGG_P
New Contributor III

It's working just by setting experiment on specific path

mlflow.set_experiment(f"/Users/${username}/my_exp")

BernardoC
New Contributor II

Nice contribution!

kdatt
Databricks Partner

I had same issue while trying to call notebook from workflow. I was able to do what you did. But it needs new experiment name for each run, so I had to do this:

# Set the experiment
experiment_name = f"/Workspace/MLOps/{env}/experiment/{experiment}_{time.strftime('%Y-%m-%d_%H-%M-%S')}"
mlflow.set_experiment(experiment_name)
 
But this assigns a new experiment ID each run which doesnt work for me as I was hardcoding that ID for inference.
Not sure whats the best option here.