cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Feature Store - lookback_window does not work with primary keys of "date" type

Kjetil
New Contributor III

I just discovered what I believe is a bug in Feature Store. The expected value (of the "value" column) is 'NULL' but the actual value is "a". If I instead change the format to timestamp of the "date" column (i.e. removes the .date() in the generation of the date value in the feature table), the result is indeed 'NULL' as expected.

Databricks runtime: 14.3 LTS ML (includes Apache Spark 3.5.0, Scala 2.12)

The code that re-creates the issue:

```python

import datetime as dt
from pyspark.sql import Row
from databricks.feature_engineering import FeatureEngineeringClient, FeatureLookup


feature_table_catalog_path = "catalog.schema.table" #insert your own unity catalog path
feature_table_data = [
Row(id=1, date=dt.datetime(2024, 1, 1).date(), value="a")
]
feature_table = spark.createDataFrame(feature_table_data)
fe = FeatureEngineeringClient()

fe.create_table(
name=feature_table_catalog_path,
primary_keys=["id", "date"],
schema=feature_table.schema,
timeseries_columns='date'
)
fe.write_table(
name=feature_table_catalog_path,
df=feature_table,
mode="merge"
)

dataset_with_target = [
Row(id=1, date=dt.datetime(2024, 1, 7), target=1),
]
dataset_with_target = spark.createDataFrame(dataset_with_target)
feature_lookup = FeatureLookup(table_name=feature_table_catalog_path, lookup_key="id", timestamp_lookup_key="date", lookback_window=dt.timedelta(days=3))
training_dataset = fe.create_training_set(df=dataset_with_target, feature_lookups=[feature_lookup], label='target')
training_dataset.load_df().show()

```

1 REPLY 1

Kjetil
New Contributor III

Thank you for answering. Yes, that is also what I figured out. In other words the lookback_window argument only works when using timestamp format for the primary key. I cannot see that this behavior is described in the documentation.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group