cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Feature tables & Null Values

__paolo_c__
Contributor II

Hi!

I was wondering if any of you has ever dealt with Feature tables and null values (more specifically, via feature engineering objects, rather than feature store, although I don't think it really matters).

In brief, null values are allowed to be stored in features tables (as long as they aren't in the primary keys, of course) as some models (mainly the ones coming from the "tree family") can deal with them.

However, the problem I am facing now (first time with null values into features tables to be frank), is related to the methods to retrieve the data frame once the time for training comes: I can correctly define the training_set_df as:

training_set = fe.create_training_set(
  df=label_df,
  feature_lookups=lookups_list,
  label="TARGET",
  exclude_columns=primary_keys
 )
 
training_set_df = training_set.load_df()

But that's the lazy evaluation, if I try to use training_set_df like:

display(
  training_set_df
  .head(3)
)

I have been thrown the error: Some of types cannot be determined after inferring.

I tried two alternative solutions:

  • Option n.1; from the lookups, removing the fields which have null values only (within the current set of primary keys, of course I don't have an entire column of nulls in the overall feature table)
  • Option n.2; retrieve the schema (combined_schema) of the features while I create the lookups, and I define the training_set_df like:
training_set_df = spark.createDataFrame(
  training_set.load_df().collect(),
  schema=combined_schema
)​

None of the options above actually worked, which means I have the same error mentioned above (in red). So, 2 questions for you:

  1. Why load_df is not able to infer the schema from the feature store, even when the subset selected for training contains all nulls (in one or more columns)? Feature store knows the actual types!
  2. How can I solve the problem on my end?

Thanks!

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group