topic Model flavour using feature store model training log_model() in Machine Learning

Model flavour using feature store model training log_model()

Edna — Thu, 11 Apr 2024 06:00:22 GMT

Hi I'm have succesfully registered my model using the feature engineering client with the following codes:

with mlflow.start_run(): # Calculate the ratio of negative class samples to positive class samples ratio = (len(y_train) - y_train.sum()) / y_train.sum() # Fit model xgb_model = xgb.XGBClassifier(scale_pos_weight=ratio) xgb_model.fit(X_train, y_train) fe.log_model( model=xgb_model, artifact_path=MODEL_NAME, flavor=mlflow.sklearn, training_set=training_set, registered_model_name=MODEL_NAME )

There are two questions:

1. Why is the model still shown as pyfunc in the model registry when the flavor I specified was mlflow.sklearn?

2. Can I use the following codes for prediction:

model = mlflow.sklearn.load_model(model_version_uri) # Predict with model prob_pred = model.predict_proba(df)[:, 1]

or do I must use score_batch()? As I would need prediction to be probabilities instead of 1/0s.

Thanks!

#model_flavor #feature_store #score_batch #xgboost #sklearn

Re: Model flavour using feature store model training log_model()

Kumaran — Tue, 30 Apr 2024 18:46:29 GMT

Hello @Edna

Thank you for contacting Databricks community support.

MLflow allows you to save models using different "flavors," which are essentially different ways of serializing and deserializing models. When you specify flavor=mlflow.sklearn, you're telling MLflow to save the model using the scikit-learn flavor.

However, when you register the model in the model registry, MLflow will automatically create a pyfunc version of the model in addition to the scikit-learn version. This is because pyfunc is a generic flavor that can be used to load and serve models in a variety of environments, regardless of the flavor used to save the model.

So even though you specified flavor=mlflow.sklearn, the model will still be shown as pyfunc in the model registry. This is expected behavior and allows the model to be easily deployed in a variety of environments.

If you want to deploy the model using the scikit-learn flavor specifically, you can do so by specifying the flavor when you load the model from the registry. For example:

import mlflow
import xgboost as xgb

Load the model using the scikit-learn flavor
model = mlflow.sklearn.load_model(f"models:/{MODEL_NAME}/1")

Use the model to make predictions
predictions = model.predict(X_test)

In this example, mlflow.sklearn.load_model() is used to load the model using the scikit-learn flavor, even though the model is registered as a pyfunc in the model registry.

Re: Model flavour using feature store model training log_model()

MiStankai — Fri, 19 Jul 2024 07:33:40 GMT

Hi @Kumaran ! Thank you for this response! Unfortunately, I find that this same thing does not work with a Catboost Model, event though mlflow.catboost flavour is supported by MLFlow. Could you help me with this?
These are the libs I'm using:

%pip install 'catboost==1.2.5' -q 
%pip install 'databricks-feature-engineering==0.6.0' -q 
%pip install 'mlflow==2.14.3' -q 
%pip install 'shap==0.44.0' -q

I log the model with:

with mlflow.start_run():
  fe.log_model(
    model = model, 
    artifact_path = 'model',
    flavor = mlflow.catboost,
    training_set = training_set,
    registered_model_name = model_uc_name,
    signature = signature,
    input_example = X_train.head(1)
)

I load it like this:

best_model = mlflow.catboost.load_model(model_uri)

And I get this error:

MlflowException: Model does not have the "catboost" flavor.

And I need to use the FE client to use your cool Feature Lookups. Please help, I'd really apreciate it!

Cheers!!

Re: Model flavour using feature store model training log_model()

robbe — Mon, 22 Jul 2024 08:50:02 GMT

@Ednaunfortunately it seems that the only way to load a model logged using the Feature Store client to perform batch scoring is by using using fe.score_batch(model_uri, df).

If you need to use the model to predict probabilities, then maybe you can log a custom pyfunc.ModelWrapper (https://mlflow.org/docs/latest/python_api/mlflow.pyfunc.html#pyfunc-create-custom) and in the predict() function you return the result of model.predict_proba().

Re: Model flavour using feature store model training log_model()

Edna — Mon, 22 Jul 2024 09:29:35 GMT

Thanks for your reply @robbe - yes I have created a custom pyfunc model which I can now use fe.score_batch() to return probabilities. Here is the code:

# Calculate the ratio of negative class samples to positive class samples ratio = (len(y_train) - y_train.sum()) / y_train.sum() # Fit model xgb_model = xgb.XGBClassifier(scale_pos_weight=ratio, enable_categorical=True) xgb_model.fit(X_train, y_train) y_probs = xgb_model.predict_proba(X_test) y_pred = pd.Series([1 if prob > 0.5 else 0 for prob in y_probs[:,1]], index=y_test.index) class churnProbability(mlflow.pyfunc.PythonModel): def __init__(self, trained_model): self.model = trained_model def preprocess_result(self, model_input): return model_input def predict(self, context, model_input): processed_df = self.preprocess_result(model_input.copy()) processed_df["utility_code"] = processed_df["utility_code"].astype("category") processed_df["payment_method_name"] = processed_df["payment_method_name"].astype("category") results = self.model.predict_proba(processed_df) return results[:,1] pyfunc_model = churnProbability(xgb_model) # End the current MLflow run and start a new one to log the new pyfunc model mlflow.end_run() with mlflow.start_run() as run: fe.log_model( model=pyfunc_model, artifact_path=MODEL_NAME, flavor=mlflow.pyfunc, training_set=training_set, registered_model_name=MODEL_NAME, ) # Logging relevant metrics for experiment run comparison and for posterity mlflow.log_metrics({'Precision Score': precision_score(y_test, y_pred), 'Recall Score': recall_score(y_test, y_pred), 'ROC-AUC Score': roc_auc_score(y_test, y_pred)}) # Storing artifacts and attaching the to the model run # mlflow.log_artifact(metrics_df.to_csv(index=False), "metrics_df.csv") f, axes = plt.subplots(1, 2, figsize=(20,5)) plot_confusion_matrix(xgb_model, X_test, y_test, ax=axes[0]) plot_roc_curve(xgb_model, X_test, y_test, ax=axes[1]) mlflow.log_figure(f, 'confusion_matrix_roc_curve.png') plt.close('all')