Handling Null Values in Feature Stores

Machine Learning

Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.

Hi, I am using multiple feature stores in my workflow using feature lookups. In my logged pipeline, I have several stages, including Assembler, Standard Scaler, Indexer and then Model. However, I am facing an issue during inference using the `score batch` function.

If any such identifier exists which does not have all the pre-computed values in feature stores, the join operation based on feature lookups will assign a null value, and then that null value will be passed directly to the model in the `score batch` function. Is there any way to handle this? I have tried the following methods until now:

Defining an initial stage of custom transformer in my pipeline to handle such columns. But in order to use it properly I will have to log this additional code along with my model. This can be done with Mlflow using the code_path parameter, but the feature store `log_model` method does not provide this parameter.
Feature store provides a FeatureFunction method to calculate on demand features, but this method is used for adding additional columns to our resultant dataframe. Can we leverage this method to handle null values of some columns by defining logic in the functions to replace them with nulls?

Thanks.