Databricks Community

ostae911 · ‎06-19-2025

We are running AutoML Forecast on Databricks Runtime 15.4 ML LTS and 16.4 ML LTS, using a time series dataset with temporal covariates from the Feature Store (e.g. a corona_dummy feature). We use feature_store_lookups with lookup_key and timestamp_lookup_key.

Our feature table is defined like this:

fs.create_table(
  name="...features.example_corona_features",
  primary_keys=["Monat", "Produkt", "Vertriebstyp_Art"],
  df=...,
  timestamp_keys="Monat",
  ...
)

And we call AutoML with:

feature_store_lookups=[{
  "table_name": "...features.example_corona_features",
  "lookup_key": ["Produkt", "Vertriebstyp_Art"],
  "timestamp_lookup_key": "Monat"
}]

✅ Expected:

AutoML performs a temporal join between the dataset and the feature table (via timestamp and keys) and proceeds with training including the covariate corona_dummy.

❌ Actual:

AutoML proceeds with the run, but fails during internal applyInPandas() or .toPandas() conversion, throwing:

ValueError: Length mismatch: Expected axis has 8 elements, new values have 11 elements

This crash occurs after joining features and loading the training set — i.e., during execution of AutoML’s internal training loop.

🔍 Observations:

When we remove the feature_store_lookups, AutoML completes without errors.
The issue appears only when the timestamp column (Monat) is both:
- present in primary_keys, and
- passed again as timestamp_lookup_key

Can you confirm if this is a known issue, and what the correct contract is for using feature_store_lookups with timestamp_lookup_key in AutoML?

jamesl · ‎10-10-2025

Hi @ostae911 , are you still facing this issue?

It looks like your usage of the timestamp column is correct. It can be used as a primary key on the time series feature table. Is it possible that there are other duplicate columns between the training dataframe and the `example_corona_features` table?

You could try creating the training set first and then pass it to `automl.forecast` without the list of feature lookups. You also may be able to rename features if there are indeed duplicate column names.

I hope that helps, but feel free to ask any follow up questions. If this reply does resolve the issue, please click the "Accept Solution" button to let us know!

-James