<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Model flavour using feature store model training log_model() in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/model-flavour-using-feature-store-model-training-log-model/m-p/66047#M3183</link>
    <description>&lt;P&gt;Hi I'm have succesfully registered my model using the feature engineering client with the following codes:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;with mlflow.start_run():
    # Calculate the ratio of negative class samples to positive class samples
    ratio = (len(y_train) - y_train.sum()) / y_train.sum()

    # Fit model
    xgb_model = xgb.XGBClassifier(scale_pos_weight=ratio)
    xgb_model.fit(X_train, y_train)

    fe.log_model(
      model=xgb_model,
      artifact_path=MODEL_NAME,
      flavor=mlflow.sklearn,
      training_set=training_set,
      registered_model_name=MODEL_NAME
    )&lt;/LI-CODE&gt;&lt;P&gt;There are two questions:&lt;/P&gt;&lt;P&gt;1. Why is the model still shown as pyfunc in the model registry when the flavor I specified was mlflow.sklearn?&lt;/P&gt;&lt;P&gt;2.&amp;nbsp; Can I use the following codes for prediction:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;model = mlflow.sklearn.load_model(model_version_uri)

# Predict with model
prob_pred = model.predict_proba(df)[:, 1]&lt;/LI-CODE&gt;&lt;P&gt;or do I must use score_batch()? As I would need prediction to be probabilities instead of 1/0s.&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;#model_flavor #feature_store #score_batch #xgboost #sklearn&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 11 Apr 2024 06:00:22 GMT</pubDate>
    <dc:creator>Edna</dc:creator>
    <dc:date>2024-04-11T06:00:22Z</dc:date>
    <item>
      <title>Model flavour using feature store model training log_model()</title>
      <link>https://community.databricks.com/t5/machine-learning/model-flavour-using-feature-store-model-training-log-model/m-p/66047#M3183</link>
      <description>&lt;P&gt;Hi I'm have succesfully registered my model using the feature engineering client with the following codes:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;with mlflow.start_run():
    # Calculate the ratio of negative class samples to positive class samples
    ratio = (len(y_train) - y_train.sum()) / y_train.sum()

    # Fit model
    xgb_model = xgb.XGBClassifier(scale_pos_weight=ratio)
    xgb_model.fit(X_train, y_train)

    fe.log_model(
      model=xgb_model,
      artifact_path=MODEL_NAME,
      flavor=mlflow.sklearn,
      training_set=training_set,
      registered_model_name=MODEL_NAME
    )&lt;/LI-CODE&gt;&lt;P&gt;There are two questions:&lt;/P&gt;&lt;P&gt;1. Why is the model still shown as pyfunc in the model registry when the flavor I specified was mlflow.sklearn?&lt;/P&gt;&lt;P&gt;2.&amp;nbsp; Can I use the following codes for prediction:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;model = mlflow.sklearn.load_model(model_version_uri)

# Predict with model
prob_pred = model.predict_proba(df)[:, 1]&lt;/LI-CODE&gt;&lt;P&gt;or do I must use score_batch()? As I would need prediction to be probabilities instead of 1/0s.&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;#model_flavor #feature_store #score_batch #xgboost #sklearn&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 11 Apr 2024 06:00:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/model-flavour-using-feature-store-model-training-log-model/m-p/66047#M3183</guid>
      <dc:creator>Edna</dc:creator>
      <dc:date>2024-04-11T06:00:22Z</dc:date>
    </item>
    <item>
      <title>Re: Model flavour using feature store model training log_model()</title>
      <link>https://community.databricks.com/t5/machine-learning/model-flavour-using-feature-store-model-training-log-model/m-p/67733#M3231</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/91421"&gt;@Edna&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you for contacting Databricks community support.&lt;/P&gt;
&lt;P&gt;MLflow allows you to save models using different "flavors," which are essentially different ways of serializing and deserializing models. When you specify&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="du-bois-light-typography css-v80wf5"&gt;&lt;CODE&gt;flavor=mlflow.sklearn&lt;/CODE&gt;&lt;/SPAN&gt;, you're telling MLflow to save the model using the scikit-learn flavor.&lt;/P&gt;
&lt;P&gt;However, when you register the model in the model registry, MLflow will automatically create a&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="du-bois-light-typography css-v80wf5"&gt;&lt;CODE&gt;pyfunc&lt;/CODE&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;version of the model in addition to the scikit-learn version. This is because&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="du-bois-light-typography css-v80wf5"&gt;&lt;CODE&gt;pyfunc&lt;/CODE&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;is a generic flavor that can be used to load and serve models in a variety of environments, regardless of the flavor used to save the model.&lt;/P&gt;
&lt;P&gt;So even though you specified&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="du-bois-light-typography css-v80wf5"&gt;&lt;CODE&gt;flavor=mlflow.sklearn&lt;/CODE&gt;&lt;/SPAN&gt;, the model will still be shown as&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="du-bois-light-typography css-v80wf5"&gt;&lt;CODE&gt;pyfunc&lt;/CODE&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;in the model registry. This is expected behavior and allows the model to be easily deployed in a variety of environments.&lt;/P&gt;
&lt;P&gt;If you want to deploy the model using the scikit-learn flavor specifically, you can do so by specifying the flavor when you load the model from the registry. For example:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;&lt;SPAN class="token token"&gt;import&lt;/SPAN&gt;&lt;SPAN&gt; mlflow
&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;import&lt;/SPAN&gt;&lt;SPAN&gt; xgboost &lt;/SPAN&gt;&lt;SPAN class="token token"&gt;as&lt;/SPAN&gt;&lt;SPAN&gt; xgb
&lt;/SPAN&gt;
&lt;SPAN class="token token"&gt;Load the model using the scikit-learn flavor&lt;/SPAN&gt;
&lt;SPAN&gt;model &lt;/SPAN&gt;&lt;SPAN class="token token"&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; mlflow&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;sklearn&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;load_model&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token string-interpolation"&gt;f"models:/&lt;/SPAN&gt;&lt;SPAN class="token token string-interpolation interpolation"&gt;{&lt;/SPAN&gt;&lt;SPAN class="token token string-interpolation interpolation"&gt;MODEL_NAME&lt;/SPAN&gt;&lt;SPAN class="token token string-interpolation interpolation"&gt;}&lt;/SPAN&gt;&lt;SPAN class="token token string-interpolation"&gt;/1"&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;)&lt;/SPAN&gt;

&lt;SPAN class="token token"&gt;Use the model to make predictions&lt;/SPAN&gt;
&lt;SPAN&gt;predictions &lt;/SPAN&gt;&lt;SPAN class="token token"&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; model&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;predict&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;X_test&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;)&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;LI-WRAPPER&gt;&lt;/LI-WRAPPER&gt;&lt;/P&gt;
&lt;P&gt;In this example,&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="du-bois-light-typography css-v80wf5"&gt;&lt;CODE&gt;mlflow.sklearn.load_model()&lt;/CODE&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;is used to load the model using the scikit-learn flavor, even though the model is registered as a&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="du-bois-light-typography css-v80wf5"&gt;&lt;CODE&gt;pyfunc&lt;/CODE&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;in the model registry.&lt;/P&gt;
&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;</description>
      <pubDate>Tue, 30 Apr 2024 18:46:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/model-flavour-using-feature-store-model-training-log-model/m-p/67733#M3231</guid>
      <dc:creator>Kumaran</dc:creator>
      <dc:date>2024-04-30T18:46:29Z</dc:date>
    </item>
    <item>
      <title>Re: Model flavour using feature store model training log_model()</title>
      <link>https://community.databricks.com/t5/machine-learning/model-flavour-using-feature-store-model-training-log-model/m-p/79347#M3443</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/63081"&gt;@Kumaran&lt;/a&gt; ! Thank you for this response! Unfortunately, I find that this same thing does not work with a Catboost Model, event though mlflow.catboost flavour is supported by MLFlow. Could you help me with this?&lt;BR /&gt;These are the libs I'm using:&lt;/P&gt;&lt;PRE&gt;%pip install 'catboost==1.2.5' -q &lt;BR /&gt;%pip install 'databricks-feature-engineering==0.6.0' -q &lt;BR /&gt;%pip install 'mlflow==2.14.3' -q &lt;BR /&gt;%pip install 'shap==0.44.0' -q&lt;/PRE&gt;&lt;P&gt;I log the model with:&lt;/P&gt;&lt;PRE&gt;with mlflow.start_run():&lt;BR /&gt;  fe.log_model(&lt;BR /&gt;    model = model, &lt;BR /&gt;    artifact_path = 'model',&lt;BR /&gt;    flavor = mlflow.catboost,&lt;BR /&gt;    training_set = training_set,&lt;BR /&gt;    registered_model_name = model_uc_name,&lt;BR /&gt;    signature = signature,&lt;BR /&gt;    input_example = X_train.head(1)&lt;BR /&gt;)&lt;/PRE&gt;&lt;P&gt;I load it like this:&lt;/P&gt;&lt;PRE&gt;best_model = mlflow.catboost.load_model(model_uri)&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;And I get this error:&lt;/P&gt;&lt;PRE&gt;MlflowException: Model does not have the "catboost" flavor.&lt;/PRE&gt;&lt;P&gt;And I need to use the FE client to use your cool Feature Lookups. Please help, I'd really apreciate it!&lt;/P&gt;&lt;P&gt;Cheers!!&lt;/P&gt;</description>
      <pubDate>Fri, 19 Jul 2024 07:33:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/model-flavour-using-feature-store-model-training-log-model/m-p/79347#M3443</guid>
      <dc:creator>MiStankai</dc:creator>
      <dc:date>2024-07-19T07:33:40Z</dc:date>
    </item>
    <item>
      <title>Re: Model flavour using feature store model training log_model()</title>
      <link>https://community.databricks.com/t5/machine-learning/model-flavour-using-feature-store-model-training-log-model/m-p/79829#M3450</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/91421"&gt;@Edna&lt;/a&gt;unfortunately it seems that the only way to load a model logged using the Feature Store client to perform batch scoring is by using using fe.score_batch(model_uri, df).&lt;/P&gt;&lt;P&gt;If you need to use the model to predict probabilities, then maybe you can log a custom pyfunc.ModelWrapper (&lt;A href="https://mlflow.org/docs/latest/python_api/mlflow.pyfunc.html#pyfunc-create-custom" target="_blank"&gt;https://mlflow.org/docs/latest/python_api/mlflow.pyfunc.html#pyfunc-create-custom&lt;/A&gt;) and in the predict() function you return the result of model.predict_proba().&lt;/P&gt;</description>
      <pubDate>Mon, 22 Jul 2024 08:50:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/model-flavour-using-feature-store-model-training-log-model/m-p/79829#M3450</guid>
      <dc:creator>robbe</dc:creator>
      <dc:date>2024-07-22T08:50:02Z</dc:date>
    </item>
    <item>
      <title>Re: Model flavour using feature store model training log_model()</title>
      <link>https://community.databricks.com/t5/machine-learning/model-flavour-using-feature-store-model-training-log-model/m-p/79835#M3451</link>
      <description>&lt;P&gt;Thanks for your reply&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/102750"&gt;@robbe&lt;/a&gt;&amp;nbsp;- yes I have created a custom pyfunc model which I can now use fe.score_batch() to return probabilities. Here is the code:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;# Calculate the ratio of negative class samples to positive class samples
ratio = (len(y_train) - y_train.sum()) / y_train.sum()

# Fit model
xgb_model = xgb.XGBClassifier(scale_pos_weight=ratio, enable_categorical=True)
xgb_model.fit(X_train, y_train)

y_probs = xgb_model.predict_proba(X_test)
y_pred = pd.Series([1 if prob &amp;gt; 0.5 else 0 for prob in y_probs[:,1]], index=y_test.index)

class churnProbability(mlflow.pyfunc.PythonModel):
    def __init__(self, trained_model):
        self.model = trained_model

    def preprocess_result(self, model_input):
        return model_input

    def predict(self, context, model_input):
        processed_df = self.preprocess_result(model_input.copy())
        processed_df["utility_code"] = processed_df["utility_code"].astype("category")
        processed_df["payment_method_name"] = processed_df["payment_method_name"].astype("category")
        results = self.model.predict_proba(processed_df)
        return results[:,1]


pyfunc_model = churnProbability(xgb_model)

# End the current MLflow run and start a new one to log the new pyfunc model
mlflow.end_run()

with mlflow.start_run() as run:
    fe.log_model(
        model=pyfunc_model,
        artifact_path=MODEL_NAME,
        flavor=mlflow.pyfunc,
        training_set=training_set,
        registered_model_name=MODEL_NAME,
    )
    # Logging relevant metrics for experiment run comparison and for posterity
    mlflow.log_metrics({'Precision Score': precision_score(y_test, y_pred), 
                        'Recall Score': recall_score(y_test, y_pred), 
                        'ROC-AUC Score': roc_auc_score(y_test, y_pred)})
    
    # Storing artifacts and attaching the to the model run
    # mlflow.log_artifact(metrics_df.to_csv(index=False), "metrics_df.csv")
    f, axes = plt.subplots(1, 2, figsize=(20,5))
    plot_confusion_matrix(xgb_model, X_test, y_test, ax=axes[0])
    plot_roc_curve(xgb_model, X_test, y_test, ax=axes[1])
    mlflow.log_figure(f, 'confusion_matrix_roc_curve.png')

plt.close('all')&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 22 Jul 2024 09:29:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/model-flavour-using-feature-store-model-training-log-model/m-p/79835#M3451</guid>
      <dc:creator>Edna</dc:creator>
      <dc:date>2024-07-22T09:29:35Z</dc:date>
    </item>
  </channel>
</rss>

