TrainingSet schema difference during training and inference
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-12-2024 07:44 AM
Hi,
I'm using the Feature Store to train an ml model and log it using MLflow and FeatureStoreClient(). This model is then used for inference.
I understand the schema of the TrainingSet should not differ between training time and inference time. However, during training, an additional "weight" column is required to guide the model's learning process. These weights are not available during inference time when using score_batch().
I'm trying to find a clean work-around for this schema difference, while still using the Feature Store. I tried:
- Including the "weight" column in the create_trainig_set() for training --> Not possible, column not available during inference.
- Joining the "weight" column after create_training_set() during training --> Not possible, keys are dropped in the TrainingSet.
- Dropping the "weight" column after create_training_set() --> I can't find a method to drop it completely from the TrainingSet.
Any suggestions?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-13-2024 03:15 PM - edited 08-13-2024 03:17 PM
Hi @Quinten,
- Create a new feature group with the same schema as your TrainingSet, but with an additional "weight" column.
- During training, join the TrainingSet with the new feature group to add the "weight" column.
- After training, you can drop the "weight" column from the TrainingSet using the drop_columns method provided by the FeatureStoreClient.
- During inference, you can use the original TrainingSet without the "weight" column.Here's some sample code to illustrate the steps:
1 # Create a new feature group with the "weight" column 2 weight_feature_group = fs.create_feature_group( 3 name="weight_feature_group", 4 table_name="weight_feature_group_table", 5 primary_keys=["primary_key_column"], 6 schema={ 7 "primary_key_column": "string", 8 "weight": "double" 9 } 10) 11 12 # Join the TrainingSet with the new feature group during training 13 training_set_with_weight = training_set.join(weight_feature_group, on="primary_key_column") 14 15 #Drop the "weight" column from the TrainingSet after training 16 training_set = training_set.drop_columns(["weight"]) 17 18 #Use the original TrainingSet without the "weight" column during inference 19 inference_set = fs.get_historical_features(feature_group_names=["inference_feature_group"])
This approach allows you to keep the schema of the TrainingSet consistent between training and inference time while still using the Feature Store.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-16-2024 05:23 AM
Thanks for the response @KumaranT .
Unfortunately, training_set has no attribute 'join'. For that to work you would first need to load the df using training_set.load_df(). However, this dataframe contains no primary keys, thus joining on keys is not possible. Or am I missing something?
I created a work-around by joining on the index, but it is not a clean solution.

