04-10-2024 10:57 PM - edited 04-10-2024 11:00 PM
Hi I'm have succesfully registered my model using the feature engineering client with the following codes:
with mlflow.start_run():
# Calculate the ratio of negative class samples to positive class samples
ratio = (len(y_train) - y_train.sum()) / y_train.sum()
# Fit model
xgb_model = xgb.XGBClassifier(scale_pos_weight=ratio)
xgb_model.fit(X_train, y_train)
fe.log_model(
model=xgb_model,
artifact_path=MODEL_NAME,
flavor=mlflow.sklearn,
training_set=training_set,
registered_model_name=MODEL_NAME
)
There are two questions:
1. Why is the model still shown as pyfunc in the model registry when the flavor I specified was mlflow.sklearn?
2. Can I use the following codes for prediction:
model = mlflow.sklearn.load_model(model_version_uri)
# Predict with model
prob_pred = model.predict_proba(df)[:, 1]
or do I must use score_batch()? As I would need prediction to be probabilities instead of 1/0s.
Thanks!
#model_flavor #feature_store #score_batch #xgboost #sklearn
07-22-2024 01:50 AM
@Ednaunfortunately it seems that the only way to load a model logged using the Feature Store client to perform batch scoring is by using using fe.score_batch(model_uri, df).
If you need to use the model to predict probabilities, then maybe you can log a custom pyfunc.ModelWrapper (https://mlflow.org/docs/latest/python_api/mlflow.pyfunc.html#pyfunc-create-custom) and in the predict() function you return the result of model.predict_proba().
04-30-2024 11:46 AM
Hello @Edna
Thank you for contacting Databricks community support.
MLflow allows you to save models using different "flavors," which are essentially different ways of serializing and deserializing models. When you specify flavor=mlflow.sklearn
, you're telling MLflow to save the model using the scikit-learn flavor.
However, when you register the model in the model registry, MLflow will automatically create a pyfunc
version of the model in addition to the scikit-learn version. This is because pyfunc
is a generic flavor that can be used to load and serve models in a variety of environments, regardless of the flavor used to save the model.
So even though you specified flavor=mlflow.sklearn
, the model will still be shown as pyfunc
in the model registry. This is expected behavior and allows the model to be easily deployed in a variety of environments.
If you want to deploy the model using the scikit-learn flavor specifically, you can do so by specifying the flavor when you load the model from the registry. For example:
import mlflow
import xgboost as xgb
Load the model using the scikit-learn flavor
model = mlflow.sklearn.load_model(f"models:/{MODEL_NAME}/1")
Use the model to make predictions
predictions = model.predict(X_test)
In this example, mlflow.sklearn.load_model()
is used to load the model using the scikit-learn flavor, even though the model is registered as a pyfunc
in the model registry.
07-19-2024 12:33 AM
Hi @Kumaran ! Thank you for this response! Unfortunately, I find that this same thing does not work with a Catboost Model, event though mlflow.catboost flavour is supported by MLFlow. Could you help me with this?
These are the libs I'm using:
%pip install 'catboost==1.2.5' -q
%pip install 'databricks-feature-engineering==0.6.0' -q
%pip install 'mlflow==2.14.3' -q
%pip install 'shap==0.44.0' -q
I log the model with:
with mlflow.start_run():
fe.log_model(
model = model,
artifact_path = 'model',
flavor = mlflow.catboost,
training_set = training_set,
registered_model_name = model_uc_name,
signature = signature,
input_example = X_train.head(1)
)
I load it like this:
best_model = mlflow.catboost.load_model(model_uri)
And I get this error:
MlflowException: Model does not have the "catboost" flavor.
And I need to use the FE client to use your cool Feature Lookups. Please help, I'd really apreciate it!
Cheers!!
07-22-2024 01:50 AM
@Ednaunfortunately it seems that the only way to load a model logged using the Feature Store client to perform batch scoring is by using using fe.score_batch(model_uri, df).
If you need to use the model to predict probabilities, then maybe you can log a custom pyfunc.ModelWrapper (https://mlflow.org/docs/latest/python_api/mlflow.pyfunc.html#pyfunc-create-custom) and in the predict() function you return the result of model.predict_proba().
07-22-2024 02:29 AM
Thanks for your reply @robbe - yes I have created a custom pyfunc model which I can now use fe.score_batch() to return probabilities. Here is the code:
# Calculate the ratio of negative class samples to positive class samples
ratio = (len(y_train) - y_train.sum()) / y_train.sum()
# Fit model
xgb_model = xgb.XGBClassifier(scale_pos_weight=ratio, enable_categorical=True)
xgb_model.fit(X_train, y_train)
y_probs = xgb_model.predict_proba(X_test)
y_pred = pd.Series([1 if prob > 0.5 else 0 for prob in y_probs[:,1]], index=y_test.index)
class churnProbability(mlflow.pyfunc.PythonModel):
def __init__(self, trained_model):
self.model = trained_model
def preprocess_result(self, model_input):
return model_input
def predict(self, context, model_input):
processed_df = self.preprocess_result(model_input.copy())
processed_df["utility_code"] = processed_df["utility_code"].astype("category")
processed_df["payment_method_name"] = processed_df["payment_method_name"].astype("category")
results = self.model.predict_proba(processed_df)
return results[:,1]
pyfunc_model = churnProbability(xgb_model)
# End the current MLflow run and start a new one to log the new pyfunc model
mlflow.end_run()
with mlflow.start_run() as run:
fe.log_model(
model=pyfunc_model,
artifact_path=MODEL_NAME,
flavor=mlflow.pyfunc,
training_set=training_set,
registered_model_name=MODEL_NAME,
)
# Logging relevant metrics for experiment run comparison and for posterity
mlflow.log_metrics({'Precision Score': precision_score(y_test, y_pred),
'Recall Score': recall_score(y_test, y_pred),
'ROC-AUC Score': roc_auc_score(y_test, y_pred)})
# Storing artifacts and attaching the to the model run
# mlflow.log_artifact(metrics_df.to_csv(index=False), "metrics_df.csv")
f, axes = plt.subplots(1, 2, figsize=(20,5))
plot_confusion_matrix(xgb_model, X_test, y_test, ax=axes[0])
plot_roc_curve(xgb_model, X_test, y_test, ax=axes[1])
mlflow.log_figure(f, 'confusion_matrix_roc_curve.png')
plt.close('all')
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group