Hi, I'm Yuki,
I'm considering when I should use register_model.
In my case, I'm running the training batch once a week and if the model is good, I want to update the champion.
I have created the code to register the model if the score is the best.
# start run
with mlflow.start_run():
clf = RandomForestRegressor(n_estimators=100)
clf.fit(X, y)
# log model to the run
mlflow.sklearn.log_model(sk_model=clf)
# search best model of all runs
best_run = mlflow.search_runs([experiment_id], order_by=["metric.test_f1 DESC"]).iloc[0]
# register model
result = mlflow.register_model(f"runs:/{best_run.run_id}/model", model_name)
# create "Champion" alias for the best version
client = MlflowClient()
client.set_registered_model_alias("prod.ml_team.iris_model", "Champion", result.version)
This can avoid registering many models in the model registry, keeping it clean.
But I feel that the code is not perfect and seems strange.
First, in the documents, we can register the model every time easily and smoothly like below.
mlflow.sklearn.log_model(
sk_model=clf,
artifact_path="model",
# The signature is automatically inferred from the input example and its predicted output.
input_example=input_example,
registered_model_name="prod.ml_team.iris_model",
)
When is the case to use?
Second, my code may create duplicates of models or source runs. Of course, I can check for duplicates before registering the model, but I feel my lack of knowledge. If I want to achieve the purpose, I can use mlflow.search_runs([experiment_id], order_by=["metric.test_f1 DESC"]).iloc[0] every time and no need for registration.
I don't grasp the core idea of model registry.
How does everyone do that?