Hi!
I was wondering if any of you has ever dealt with Feature tables and null values (more specifically, via feature engineering objects, rather than feature store, although I don't think it really matters).
In brief, null values are allowed to be stored in features tables (as long as they aren't in the primary keys, of course) as some models (mainly the ones coming from the "tree family") can deal with them.
However, the problem I am facing now (first time with null values into features tables to be frank), is related to the methods to retrieve the data frame once the time for training comes: I can correctly define the training_set_df as:
training_set = fe.create_training_set(
df=label_df,
feature_lookups=lookups_list,
label="TARGET",
exclude_columns=primary_keys
)
training_set_df = training_set.load_df()
But that's the lazy evaluation, if I try to use training_set_df like:
display(
training_set_df
.head(3)
)
I have been thrown the error: Some of types cannot be determined after inferring.
I tried two alternative solutions:
- Option n.1; from the lookups, removing the fields which have null values only (within the current set of primary keys, of course I don't have an entire column of nulls in the overall feature table)
- Option n.2; retrieve the schema (combined_schema) of the features while I create the lookups, and I define the training_set_df like:
training_set_df = spark.createDataFrame(
training_set.load_df().collect(),
schema=combined_schema
)
None of the options above actually worked, which means I have the same error mentioned above (in red). So, 2 questions for you:
- Why load_df is not able to infer the schema from the feature store, even when the subset selected for training contains all nulls (in one or more columns)? Feature store knows the actual types!
- How can I solve the problem on my end?
Thanks!