Databricks Community

dkxxx-rc · ‎12-19-2024

I want to get the LightGBM built-in variable importance values from a model that was generated by AutoML. That's not logged in the metrics by default - can I change a setting so that it will be logged?

More fundamentally: what I'd really like is to modify the LightGBM notebook generated by AutoML so that it logs the model to MLflow with flavor.loader_module equal to mlflow.lightgbm. By default, it logs with that parameter equal to mlflow.sklearn.

As a result, if I want to load that model back in for new predictions, I have to use model = mlflow.pyfunc.load_model(), and that loads the model in a generic format that doesn't include useful LightGBM things like model.feature_importances.

I want to be able to load the model back in via model = mlflow.lightgbm.load_model(), but currently that generates an error because the model wasn't saved in the right format for that.

Now, I know model = mlflow.lightgbm.load_model() succeeds on a different model that I originally saved in LightGBM flavor via mlflow.lightgbm.log_model(). But the AutoML notebook doesn't use load_model(), so I have to look further for a way to force LightGBM flavor.

In that vein, I did find a command pyfunc.add_to_model(mlflow_model, loader_module="mlflow.sklearn") in the notebook. Sadly, changing it to loader_module="mlflow.lightgbm" had no discernable effect on the problem. The model saved to MLflow still had flavor.loader_module equal to mlflow.sklearn.

Alberto_Umana · ‎12-31-2024

To address your concerns about logging LightGBM feature importance and modifying the AutoML-generated LightGBM model to use the mlflow.lightgbm flavor, you'll need to make some changes to the AutoML notebook. Here's an approach to achieve what you're looking for:

Logging Feature Importance

LightGBM's feature importance is not logged by default in MLflow's autologging. To log this information, you can manually add it to the MLflow run after the model is trained. Here's how you can do this:

import mlflow

import lightgbm as lgb

# Assuming 'model' is your trained LightGBM model

feature_importance = model.feature_importance(importance_type='gain')

feature_names = model.feature_name()

# Log feature importance

for feature, importance in zip(feature_names, feature_importance):

mlflow.log_metric(f"feature_importance_{feature}", importance)

Changing Model Flavor to LightGBM

To log the model with the LightGBM flavor instead of scikit-learn, you need to modify the model logging process in the AutoML notebook. Here's how you can do it:

Find the part of the notebook where the model is being logged to MLflow.
Replace the existing logging code with mlflow.lightgbm.log_model(). Here's an example:

import mlflow.lightgbm

# Assuming 'model' is your trained LightGBM model

mlflow.lightgbm.log_model(model, "model", registered_model_name="your_model_name")

If you're using a pipeline that includes preprocessing steps, you'll need to log the LightGBM model separately from the pipeline. You can do this by extracting the LightGBM model from the pipeline:

lightgbm_model = pipeline.named_steps['lightgbm']

mlflow.lightgbm.log_model(lightgbm_model, "lightgbm_model")

You may also need to log the preprocessing steps separately if they're required for making predictions.

Loading the Model

After making these changes, you should be able to load the model using:

loaded_model = mlflow.lightgbm.load_model("runs:/your_run_id/lightgbm_model")

This loaded model will have access to LightGBM-specific attributes like feature_importances_.

View solution in original post

Alberto_Umana · ‎12-31-2024

To address your concerns about logging LightGBM feature importance and modifying the AutoML-generated LightGBM model to use the mlflow.lightgbm flavor, you'll need to make some changes to the AutoML notebook. Here's an approach to achieve what you're looking for:

Logging Feature Importance

LightGBM's feature importance is not logged by default in MLflow's autologging. To log this information, you can manually add it to the MLflow run after the model is trained. Here's how you can do this:

import mlflow

import lightgbm as lgb

# Assuming 'model' is your trained LightGBM model

feature_importance = model.feature_importance(importance_type='gain')

feature_names = model.feature_name()

# Log feature importance

for feature, importance in zip(feature_names, feature_importance):

mlflow.log_metric(f"feature_importance_{feature}", importance)

Changing Model Flavor to LightGBM

To log the model with the LightGBM flavor instead of scikit-learn, you need to modify the model logging process in the AutoML notebook. Here's how you can do it:

Find the part of the notebook where the model is being logged to MLflow.
Replace the existing logging code with mlflow.lightgbm.log_model(). Here's an example:

import mlflow.lightgbm

# Assuming 'model' is your trained LightGBM model

mlflow.lightgbm.log_model(model, "model", registered_model_name="your_model_name")

If you're using a pipeline that includes preprocessing steps, you'll need to log the LightGBM model separately from the pipeline. You can do this by extracting the LightGBM model from the pipeline:

lightgbm_model = pipeline.named_steps['lightgbm']

mlflow.lightgbm.log_model(lightgbm_model, "lightgbm_model")

You may also need to log the preprocessing steps separately if they're required for making predictions.

Loading the Model

After making these changes, you should be able to load the model using:

loaded_model = mlflow.lightgbm.load_model("runs:/your_run_id/lightgbm_model")

This loaded model will have access to LightGBM-specific attributes like feature_importances_.

Alberto_Umana · ‎12-31-2024

Additional Considerations

The pyfunc.add_to_model() function you mentioned is used to add the Python Function flavor to the model, which is different from changing the primary flavor of the logged model. That's why changing its parameter didn't solve the issue.
If you need to maintain compatibility with the existing AutoML pipeline, you might consider logging the model twice: once with the scikit-learn flavor for the pipeline, and once with the LightGBM flavor for accessing LightGBM-specific features.
Remember to test these changes thoroughly, as they may affect how the model is used in production environments that expect the scikit-learn flavor.