When does everyone utilize the model register?

Yuki
Contributor

Hi, I'm Yuki,

I'm considering when I should use register_model.

In my case, I'm running the training batch once a week and if the model is good, I want to update the champion.

I have created the code to register the model if the score is the best.

# start run
with mlflow.start_run():
    clf = RandomForestRegressor(n_estimators=100)
    clf.fit(X, y)
    # log model to the run
    mlflow.sklearn.log_model(sk_model=clf)

# search best model of all runs
best_run = mlflow.search_runs([experiment_id], order_by=["metric.test_f1 DESC"]).iloc[0]
# register model
result = mlflow.register_model(f"runs:/{best_run.run_id}/model", model_name)
# create "Champion" alias for the best version
client = MlflowClient()
client.set_registered_model_alias("prod.ml_team.iris_model", "Champion", result.version)

 

This can avoid registering many models in the model registry, keeping it clean.

But I feel that the code is not perfect and seems strange.

First, in the documents, we can register the model every time easily and smoothly like below.

    mlflow.sklearn.log_model(
        sk_model=clf,
        artifact_path="model",
        # The signature is automatically inferred from the input example and its predicted output.
        input_example=input_example,
        registered_model_name="prod.ml_team.iris_model",
    )

When is the case to use?

Second, my code may create duplicates of models or source runs. Of course, I can check for duplicates before registering the model, but I feel my lack of knowledge. If I want to achieve the purpose, I can use mlflow.search_runs([experiment_id], order_by=["metric.test_f1 DESC"]).iloc[0] every time and no need for registration.

 

I don't grasp the core idea of model registry.

How does everyone do that?

Kumaran
Databricks Employee
Databricks Employee

Hi @Yuki,

Thank you for contacting the Databricks community.

  • If you run register_model with the same run twice, you’ll create multiple versions pointing to the same source.

  • To avoid that, you can check if the run is already registered before creating a new version:

     
    from mlflow.tracking import MlflowClient
    
    client = MlflowClient()
    existing = [
        v.source for v in client.search_model_versions(f"name='{model_name}'")
    ]
    if f"runs:/{best_run.run_id}/model" not in existing:
        result = mlflow.register_model(f"runs:/{best_run.run_id}/model", model_name)