cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

When does everyone utilize the model register?

Yuki
Contributor

Hi, I'm Yuki,

I'm considering when I should use register_model.

In my case, I'm running the training batch once a week and if the model is good, I want to update the champion.

I have created the code to register the model if the score is the best.

# start run
with mlflow.start_run():
    clf = RandomForestRegressor(n_estimators=100)
    clf.fit(X, y)
    # log model to the run
    mlflow.sklearn.log_model(sk_model=clf)

# search best model of all runs
best_run = mlflow.search_runs([experiment_id], order_by=["metric.test_f1 DESC"]).iloc[0]
# register model
result = mlflow.register_model(f"runs:/{best_run.run_id}/model", model_name)
# create "Champion" alias for the best version
client = MlflowClient()
client.set_registered_model_alias("prod.ml_team.iris_model", "Champion", result.version)

 

This can avoid registering many models in the model registry, keeping it clean.

But I feel that the code is not perfect and seems strange.

First, in the documents, we can register the model every time easily and smoothly like below.

    mlflow.sklearn.log_model(
        sk_model=clf,
        artifact_path="model",
        # The signature is automatically inferred from the input example and its predicted output.
        input_example=input_example,
        registered_model_name="prod.ml_team.iris_model",
    )

When is the case to use?

Second, my code may create duplicates of models or source runs. Of course, I can check for duplicates before registering the model, but I feel my lack of knowledge. If I want to achieve the purpose, I can use mlflow.search_runs([experiment_id], order_by=["metric.test_f1 DESC"]).iloc[0] every time and no need for registration.

 

I don't grasp the core idea of model registry.

How does everyone do that?

1 REPLY 1

Kumaran
Databricks Employee
Databricks Employee

Hi @Yuki,

Thank you for contacting the Databricks community.

  • If you run register_model with the same run twice, you’ll create multiple versions pointing to the same source.

  • To avoid that, you can check if the run is already registered before creating a new version:

     
    from mlflow.tracking import MlflowClient
    
    client = MlflowClient()
    existing = [
        v.source for v in client.search_model_versions(f"name='{model_name}'")
    ]
    if f"runs:/{best_run.run_id}/model" not in existing:
        result = mlflow.register_model(f"runs:/{best_run.run_id}/model", model_name)