cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Feature tables & Null Values

__paolo_c__
Contributor II

Hi!

I was wondering if any of you has ever dealt with Feature tables and null values (more specifically, via feature engineering objects, rather than feature store, although I don't think it really matters).

In brief, null values are allowed to be stored in features tables (as long as they aren't in the primary keys, of course) as some models (mainly the ones coming from the "tree family") can deal with them.

However, the problem I am facing now (first time with null values into features tables to be frank), is related to the methods to retrieve the data frame once the time for training comes: I can correctly define the training_set_df as:

training_set = fe.create_training_set(
  df=label_df,
  feature_lookups=lookups_list,
  label="TARGET",
  exclude_columns=primary_keys
 )
 
training_set_df = training_set.load_df()

But that's the lazy evaluation, if I try to use training_set_df like:

display(
  training_set_df
  .head(3)
)

I have been thrown the error: Some of types cannot be determined after inferring.

I tried two alternative solutions:

  • Option n.1; from the lookups, removing the fields which have null values only (within the current set of primary keys, of course I don't have an entire column of nulls in the overall feature table)
  • Option n.2; retrieve the schema (combined_schema) of the features while I create the lookups, and I define the training_set_df like:
training_set_df = spark.createDataFrame(
  training_set.load_df().collect(),
  schema=combined_schema
)​

None of the options above actually worked, which means I have the same error mentioned above (in red). So, 2 questions for you:

  1. Why load_df is not able to infer the schema from the feature store, even when the subset selected for training contains all nulls (in one or more columns)? Feature store knows the actual types!
  2. How can I solve the problem on my end?

Thanks!

0 REPLIES 0

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now