cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Feature Store - lookback_window does not work with primary keys of "date" type

Kjetil
Contributor

I just discovered what I believe is a bug in Feature Store. The expected value (of the "value" column) is 'NULL' but the actual value is "a". If I instead change the format to timestamp of the "date" column (i.e. removes the .date() in the generation of the date value in the feature table), the result is indeed 'NULL' as expected.

Databricks runtime: 14.3 LTS ML (includes Apache Spark 3.5.0, Scala 2.12)

The code that re-creates the issue:

```python

import datetime as dt
from pyspark.sql import Row
from databricks.feature_engineering import FeatureEngineeringClient, FeatureLookup


feature_table_catalog_path = "catalog.schema.table" #insert your own unity catalog path
feature_table_data = [
Row(id=1, date=dt.datetime(2024, 1, 1).date(), value="a")
]
feature_table = spark.createDataFrame(feature_table_data)
fe = FeatureEngineeringClient()

fe.create_table(
name=feature_table_catalog_path,
primary_keys=["id", "date"],
schema=feature_table.schema,
timeseries_columns='date'
)
fe.write_table(
name=feature_table_catalog_path,
df=feature_table,
mode="merge"
)

dataset_with_target = [
Row(id=1, date=dt.datetime(2024, 1, 7), target=1),
]
dataset_with_target = spark.createDataFrame(dataset_with_target)
feature_lookup = FeatureLookup(table_name=feature_table_catalog_path, lookup_key="id", timestamp_lookup_key="date", lookback_window=dt.timedelta(days=3))
training_dataset = fe.create_training_set(df=dataset_with_target, feature_lookups=[feature_lookup], label='target')
training_dataset.load_df().show()

```

2 REPLIES 2

Retired_mod
Esteemed Contributor III

Hi @Kjetil, This seems related to how date formats are handled. When you use `.date()`, it strips the time component, which might interfere with lookups.

To address this, try using the full datetime format without stripping time. Ensure your feature store and Databricks runtime are updated. 

Kjetil
Contributor

Thank you for answering. Yes, that is also what I figured out. In other words the lookback_window argument only works when using timestamp format for the primary key. I cannot see that this behavior is described in the documentation.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now