Anonymous
Not applicable

@Stephen Wylie​ :

One approach to handle missing keys during batch inference would be to use a join between the lookup keys and the feature table. This would allow you to filter out the unknown keys before passing them to score_batch(), and avoid the issue with NaNs and dropna(). Here's an example of how you could implement this using PySpark:

from pyspark.sql.functions import col
 
# define the lookup keys you want to score
lookup_keys = ["key1", "key2", "key3", "key4", "key5"]
 
# create a DataFrame with the lookup keys
lookup_df = spark.createDataFrame([(key,) for key in lookup_keys], ["lookup_key"])
 
# read the feature table into a DataFrame
feature_df = spark.read.format("delta").load("path/to/feature/table")
 
# join the lookup keys with the feature table, and filter out unknown keys
score_df = lookup_df.join(feature_df, ["lookup_key"], "left_outer").filter(col("lookup_key").isNotNull())
 
# pass the resulting DataFrame to score_batch()
predictions = client.score_batch("model_name", score_df)

This should allow you to filter out unknown keys before passing them to score_batch(), without having to fetch the entire row from the feature store and run dropna().