cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

AutoML Forecast fails when using feature_store_lookups with timestamp key

ostae911
New Contributor

We are running AutoML Forecast on Databricks Runtime 15.4 ML LTS and 16.4 ML LTS, using a time series dataset with temporal covariates from the Feature Store (e.g. a corona_dummy feature). We use feature_store_lookups with lookup_key and timestamp_lookup_key.

Our feature table is defined like this:

fs.create_table(
  name="...features.example_corona_features",
  primary_keys=["Monat", "Produkt", "Vertriebstyp_Art"],
  df=...,
  timestamp_keys="Monat",
  ...
)

And we call AutoML with:

feature_store_lookups=[{
  "table_name": "...features.example_corona_features",
  "lookup_key": ["Produkt", "Vertriebstyp_Art"],
  "timestamp_lookup_key": "Monat"
}]

Expected:

AutoML performs a temporal join between the dataset and the feature table (via timestamp and keys) and proceeds with training including the covariate corona_dummy.


Actual:

AutoML proceeds with the run, but fails during internal applyInPandas() or .toPandas() conversion, throwing:

ValueError: Length mismatch: Expected axis has 8 elements, new values have 11 elements

This crash occurs after joining features and loading the training set — i.e., during execution of AutoML’s internal training loop.

🔍 Observations:

  • When we remove the feature_store_lookups, AutoML completes without errors.

  • The issue appears only when the timestamp column (Monat) is both:

    • present in primary_keys, and

    • passed again as timestamp_lookup_key

Can you confirm if this is a known issue, and what the correct contract is for using feature_store_lookups with timestamp_lookup_key in AutoML?

1 REPLY 1

jamesl
Databricks Employee
Databricks Employee

Hi @ostae911 , are you still facing this issue?

It looks like your usage of the timestamp column is correct. It can be used as a primary key on the time series feature table. Is it possible that there are other duplicate columns between the training dataframe and the `example_corona_features` table?

You could try creating the training set first and then pass it to `automl.forecast` without the list of feature lookups. You also may be able to rename features if there are indeed duplicate column names. 

I hope that helps, but feel free to ask any follow up questions. If this reply does resolve the issue, please click the "Accept Solution" button to let us know!

-James

 

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now