<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: TrainingSet schema difference during training and inference in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/trainingset-schema-difference-during-training-and-inference/m-p/82903#M3573</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/115731"&gt;@Quinten&lt;/a&gt;,&lt;/P&gt;&lt;DIV class=""&gt;You can consider creating a custom feature group to store the "weight" column during training. This way, you can keep the schema of the TrainingSet consistent between training and inference time.&lt;DIV class=""&gt;Here are the steps you can follow:&lt;OL&gt;&lt;LI&gt;Create a new feature group with the same schema as your TrainingSet, but with an additional "weight" column.&lt;/LI&gt;&lt;LI&gt;During training, join the TrainingSet with the new feature group to add the "weight" column.&lt;/LI&gt;&lt;LI&gt;After training, you can drop the "weight" column from the TrainingSet using the&lt;SPAN&gt;&amp;nbsp;&lt;SPAN class=""&gt;drop_columns&lt;SPAN&gt;&amp;nbsp;method provided by the FeatureStoreClient.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;During inference, you can use the original TrainingSet without the "weight" column.&lt;DIV class=""&gt;Here's some sample code to illustrate the steps:&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;PRE&gt;&lt;SPAN class=""&gt;1 &lt;SPAN class=""&gt;# Create a new feature group with the "weight" column
&lt;SPAN class=""&gt;2 &lt;SPAN&gt;weight_feature_group &lt;SPAN class=""&gt;=&lt;SPAN&gt; fs&lt;SPAN class=""&gt;.&lt;SPAN&gt;create_feature_group&lt;SPAN class=""&gt;(
&lt;SPAN class=""&gt;3&lt;SPAN&gt;    name&lt;SPAN class=""&gt;=&lt;SPAN class=""&gt;"weight_feature_group"&lt;SPAN class=""&gt;,
&lt;SPAN class=""&gt;4&lt;SPAN&gt;    table_name&lt;SPAN class=""&gt;=&lt;SPAN class=""&gt;"weight_feature_group_table"&lt;SPAN class=""&gt;,
&lt;SPAN class=""&gt;5&lt;SPAN&gt;    primary_keys&lt;SPAN class=""&gt;=&lt;SPAN class=""&gt;[&lt;SPAN class=""&gt;"primary_key_column"&lt;SPAN class=""&gt;]&lt;SPAN class=""&gt;,
&lt;SPAN class=""&gt;6&lt;SPAN&gt;    schema&lt;SPAN class=""&gt;=&lt;SPAN class=""&gt;{
&lt;SPAN class=""&gt;7        &lt;SPAN class=""&gt;"primary_key_column"&lt;SPAN class=""&gt;: &lt;SPAN class=""&gt;"string"&lt;SPAN class=""&gt;,
&lt;SPAN class=""&gt;8        &lt;SPAN class=""&gt;"weight"&lt;SPAN class=""&gt;: &lt;SPAN class=""&gt;"double"
&lt;SPAN class=""&gt;9    &lt;SPAN class=""&gt;}
&lt;SPAN class=""&gt;10&lt;SPAN class=""&gt;)
&lt;SPAN class=""&gt;11
&lt;SPAN class=""&gt;12 &lt;SPAN class=""&gt;# Join the TrainingSet with the new feature group during training
&lt;SPAN class=""&gt;13 &lt;SPAN&gt;training_set_with_weight &lt;SPAN class=""&gt;=&lt;SPAN&gt; training_set&lt;SPAN class=""&gt;.&lt;SPAN&gt;join&lt;SPAN class=""&gt;(&lt;SPAN&gt;weight_feature_group&lt;SPAN class=""&gt;,&lt;SPAN&gt; on&lt;SPAN class=""&gt;=&lt;SPAN class=""&gt;"primary_key_column"&lt;SPAN class=""&gt;)
&lt;SPAN class=""&gt;14
&lt;SPAN class=""&gt;15 &lt;SPAN class=""&gt;#Drop the "weight" column from the TrainingSet after training
&lt;SPAN class=""&gt;16 &lt;SPAN&gt;training_set &lt;SPAN class=""&gt;=&lt;SPAN&gt; training_set&lt;SPAN class=""&gt;.&lt;SPAN&gt;drop_columns&lt;SPAN class=""&gt;(&lt;SPAN class=""&gt;[&lt;SPAN class=""&gt;"weight"&lt;SPAN class=""&gt;]&lt;SPAN class=""&gt;)
&lt;SPAN class=""&gt;17
&lt;SPAN class=""&gt;18 &lt;SPAN class=""&gt;#Use the original TrainingSet without the "weight" column during inference
&lt;SPAN class=""&gt;19 &lt;SPAN&gt;inference_set &lt;SPAN class=""&gt;=&lt;SPAN&gt; fs&lt;SPAN class=""&gt;.&lt;SPAN&gt;get_historical_features&lt;SPAN class=""&gt;(&lt;SPAN&gt;feature_group_names&lt;SPAN class=""&gt;=&lt;SPAN class=""&gt;[&lt;SPAN class=""&gt;"inference_feature_group"&lt;SPAN class=""&gt;]&lt;SPAN class=""&gt;)&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/PRE&gt;&lt;DIV class=""&gt;This approach allows you to keep the schema of the TrainingSet consistent between training and inference time while still using the Feature Store.&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Tue, 13 Aug 2024 22:17:26 GMT</pubDate>
    <dc:creator>KumaranT</dc:creator>
    <dc:date>2024-08-13T22:17:26Z</dc:date>
    <item>
      <title>TrainingSet schema difference during training and inference</title>
      <link>https://community.databricks.com/t5/machine-learning/trainingset-schema-difference-during-training-and-inference/m-p/82758#M3570</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I'm using the Feature Store to train an ml model and log it using MLflow and FeatureStoreClient(). This model is then used for inference.&lt;/P&gt;&lt;P&gt;I understand the schema of the TrainingSet should not differ between training time and inference time. &lt;SPAN&gt;However, during training, an additional "weight" column is required to guide the model's learning process.&lt;/SPAN&gt; These weights are not available during inference time when using score_batch().&lt;/P&gt;&lt;P&gt;I'm trying to find a clean work-around for this schema difference, while still using the Feature Store. I tried:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Including the "weight" column in the create_trainig_set() for training --&amp;gt; Not possible, column not available during inference.&lt;/LI&gt;&lt;LI&gt;Joining the "weight" column after create_training_set() during training --&amp;gt; Not possible, keys are dropped in the TrainingSet.&lt;/LI&gt;&lt;LI&gt;Dropping the "weight" column after create_training_set() --&amp;gt; I can't find a method to drop it completely from the TrainingSet.&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;Any suggestions?&lt;/P&gt;</description>
      <pubDate>Mon, 12 Aug 2024 14:44:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/trainingset-schema-difference-during-training-and-inference/m-p/82758#M3570</guid>
      <dc:creator>Quinten</dc:creator>
      <dc:date>2024-08-12T14:44:23Z</dc:date>
    </item>
    <item>
      <title>Re: TrainingSet schema difference during training and inference</title>
      <link>https://community.databricks.com/t5/machine-learning/trainingset-schema-difference-during-training-and-inference/m-p/82903#M3573</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/115731"&gt;@Quinten&lt;/a&gt;,&lt;/P&gt;&lt;DIV class=""&gt;You can consider creating a custom feature group to store the "weight" column during training. This way, you can keep the schema of the TrainingSet consistent between training and inference time.&lt;DIV class=""&gt;Here are the steps you can follow:&lt;OL&gt;&lt;LI&gt;Create a new feature group with the same schema as your TrainingSet, but with an additional "weight" column.&lt;/LI&gt;&lt;LI&gt;During training, join the TrainingSet with the new feature group to add the "weight" column.&lt;/LI&gt;&lt;LI&gt;After training, you can drop the "weight" column from the TrainingSet using the&lt;SPAN&gt;&amp;nbsp;&lt;SPAN class=""&gt;drop_columns&lt;SPAN&gt;&amp;nbsp;method provided by the FeatureStoreClient.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;During inference, you can use the original TrainingSet without the "weight" column.&lt;DIV class=""&gt;Here's some sample code to illustrate the steps:&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;PRE&gt;&lt;SPAN class=""&gt;1 &lt;SPAN class=""&gt;# Create a new feature group with the "weight" column
&lt;SPAN class=""&gt;2 &lt;SPAN&gt;weight_feature_group &lt;SPAN class=""&gt;=&lt;SPAN&gt; fs&lt;SPAN class=""&gt;.&lt;SPAN&gt;create_feature_group&lt;SPAN class=""&gt;(
&lt;SPAN class=""&gt;3&lt;SPAN&gt;    name&lt;SPAN class=""&gt;=&lt;SPAN class=""&gt;"weight_feature_group"&lt;SPAN class=""&gt;,
&lt;SPAN class=""&gt;4&lt;SPAN&gt;    table_name&lt;SPAN class=""&gt;=&lt;SPAN class=""&gt;"weight_feature_group_table"&lt;SPAN class=""&gt;,
&lt;SPAN class=""&gt;5&lt;SPAN&gt;    primary_keys&lt;SPAN class=""&gt;=&lt;SPAN class=""&gt;[&lt;SPAN class=""&gt;"primary_key_column"&lt;SPAN class=""&gt;]&lt;SPAN class=""&gt;,
&lt;SPAN class=""&gt;6&lt;SPAN&gt;    schema&lt;SPAN class=""&gt;=&lt;SPAN class=""&gt;{
&lt;SPAN class=""&gt;7        &lt;SPAN class=""&gt;"primary_key_column"&lt;SPAN class=""&gt;: &lt;SPAN class=""&gt;"string"&lt;SPAN class=""&gt;,
&lt;SPAN class=""&gt;8        &lt;SPAN class=""&gt;"weight"&lt;SPAN class=""&gt;: &lt;SPAN class=""&gt;"double"
&lt;SPAN class=""&gt;9    &lt;SPAN class=""&gt;}
&lt;SPAN class=""&gt;10&lt;SPAN class=""&gt;)
&lt;SPAN class=""&gt;11
&lt;SPAN class=""&gt;12 &lt;SPAN class=""&gt;# Join the TrainingSet with the new feature group during training
&lt;SPAN class=""&gt;13 &lt;SPAN&gt;training_set_with_weight &lt;SPAN class=""&gt;=&lt;SPAN&gt; training_set&lt;SPAN class=""&gt;.&lt;SPAN&gt;join&lt;SPAN class=""&gt;(&lt;SPAN&gt;weight_feature_group&lt;SPAN class=""&gt;,&lt;SPAN&gt; on&lt;SPAN class=""&gt;=&lt;SPAN class=""&gt;"primary_key_column"&lt;SPAN class=""&gt;)
&lt;SPAN class=""&gt;14
&lt;SPAN class=""&gt;15 &lt;SPAN class=""&gt;#Drop the "weight" column from the TrainingSet after training
&lt;SPAN class=""&gt;16 &lt;SPAN&gt;training_set &lt;SPAN class=""&gt;=&lt;SPAN&gt; training_set&lt;SPAN class=""&gt;.&lt;SPAN&gt;drop_columns&lt;SPAN class=""&gt;(&lt;SPAN class=""&gt;[&lt;SPAN class=""&gt;"weight"&lt;SPAN class=""&gt;]&lt;SPAN class=""&gt;)
&lt;SPAN class=""&gt;17
&lt;SPAN class=""&gt;18 &lt;SPAN class=""&gt;#Use the original TrainingSet without the "weight" column during inference
&lt;SPAN class=""&gt;19 &lt;SPAN&gt;inference_set &lt;SPAN class=""&gt;=&lt;SPAN&gt; fs&lt;SPAN class=""&gt;.&lt;SPAN&gt;get_historical_features&lt;SPAN class=""&gt;(&lt;SPAN&gt;feature_group_names&lt;SPAN class=""&gt;=&lt;SPAN class=""&gt;[&lt;SPAN class=""&gt;"inference_feature_group"&lt;SPAN class=""&gt;]&lt;SPAN class=""&gt;)&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/PRE&gt;&lt;DIV class=""&gt;This approach allows you to keep the schema of the TrainingSet consistent between training and inference time while still using the Feature Store.&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 13 Aug 2024 22:17:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/trainingset-schema-difference-during-training-and-inference/m-p/82903#M3573</guid>
      <dc:creator>KumaranT</dc:creator>
      <dc:date>2024-08-13T22:17:26Z</dc:date>
    </item>
    <item>
      <title>Re: TrainingSet schema difference during training and inference</title>
      <link>https://community.databricks.com/t5/machine-learning/trainingset-schema-difference-during-training-and-inference/m-p/83223#M3584</link>
      <description>&lt;P&gt;Thanks for the response&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/115880"&gt;@KumaranT&lt;/a&gt;&amp;nbsp;.&lt;/P&gt;&lt;P&gt;Unfortunately, training_set has no attribute 'join'. For that to work you would first need to load the df using training_set.load_df(). However, this dataframe contains no primary keys, thus joining on keys is not possible. Or am I missing something?&lt;BR /&gt;&lt;BR /&gt;I created a work-around by joining on the index, but it is not a clean solution.&lt;/P&gt;</description>
      <pubDate>Fri, 16 Aug 2024 12:23:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/trainingset-schema-difference-during-training-and-inference/m-p/83223#M3584</guid>
      <dc:creator>Quinten</dc:creator>
      <dc:date>2024-08-16T12:23:26Z</dc:date>
    </item>
  </channel>
</rss>

