Databricks Community

__paolo_c__ · ‎08-23-2024

Hi!

I was wondering if any of you has ever dealt with Feature tables and null values (more specifically, via feature engineering objects, rather than feature store, although I don't think it really matters).

In brief, null values are allowed to be stored in features tables (as long as they aren't in the primary keys, of course) as some models (mainly the ones coming from the "tree family") can deal with them.

However, the problem I am facing now (first time with null values into features tables to be frank), is related to the methods to retrieve the data frame once the time for training comes: I can correctly define the training_set_df as:

training_set = fe.create_training_set(
  df=label_df,
  feature_lookups=lookups_list,
  label="TARGET",
  exclude_columns=primary_keys
 )
 
training_set_df = training_set.load_df()

But that's the lazy evaluation, if I try to use training_set_df like:

display(
  training_set_df
  .head(3)
)

I have been thrown the error: Some of types cannot be determined after inferring.

I tried two alternative solutions:

Option n.1; from the lookups, removing the fields which have null values only (within the current set of primary keys, of course I don't have an entire column of nulls in the overall feature table)
Option n.2; retrieve the schema (combined_schema) of the features while I create the lookups, and I define the training_set_df like:

training_set_df = spark.createDataFrame(
  training_set.load_df().collect(),
  schema=combined_schema
)

None of the options above actually worked, which means I have the same error mentioned above (in red). So, 2 questions for you:

Why load_df is not able to infer the schema from the feature store, even when the subset selected for training contains all nulls (in one or more columns)? Feature store knows the actual types!
How can I solve the problem on my end?

Thanks!

Databricks Community

Feature tables & Null Values

Join Us as a Local Community Builder!

Solution Accelerator Series | #5 - Automating Product Review Summarization with LLMs

The next BrickTalks about the latest and greatest in AI/BI is scheduled for Oct 28!

🚀 Weekly Delta (8 - 14 October): A Look Back at This Week’s Top Community Highlights

BrickCon 2025 — Dec 3–5 | A Community Conference for Databricks Builders

🌟 Community Sparks of the Week | September 26 – October 2 🌟