<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Feature Store - lookback_window does not work with primary keys of &amp;quot;date&amp;quot; type in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/feature-store-lookback-window-does-not-work-with-primary-keys-of/m-p/82364#M3557</link>
    <description>&lt;P&gt;Thank you for answering. Yes, that is also what I figured out. In other words the lookback_window argument only works when using timestamp format for the primary key. I cannot see that this behavior is described in the documentation.&lt;/P&gt;</description>
    <pubDate>Thu, 08 Aug 2024 11:10:18 GMT</pubDate>
    <dc:creator>Kjetil</dc:creator>
    <dc:date>2024-08-08T11:10:18Z</dc:date>
    <item>
      <title>Feature Store - lookback_window does not work with primary keys of "date" type</title>
      <link>https://community.databricks.com/t5/machine-learning/feature-store-lookback-window-does-not-work-with-primary-keys-of/m-p/82021#M3548</link>
      <description>&lt;P&gt;I just discovered what I believe is a bug in Feature Store. The expected value (of the "value" column) is 'NULL' but the actual value is "a". If I instead change the format to timestamp of the "date" column (i.e. removes the .date() in the generation of the date value in the feature table), the result is indeed 'NULL' as expected.&lt;BR /&gt;&lt;BR /&gt;Databricks runtime:&amp;nbsp;&lt;SPAN&gt;14.3 LTS ML (includes Apache Spark 3.5.0, Scala 2.12)&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;The code that re-creates the issue:&lt;BR /&gt;&lt;BR /&gt;```python&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;import&lt;/SPAN&gt;&lt;SPAN&gt; datetime &lt;/SPAN&gt;&lt;SPAN&gt;as&lt;/SPAN&gt;&lt;SPAN&gt; dt&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;from&lt;/SPAN&gt;&lt;SPAN&gt; pyspark.sql &lt;/SPAN&gt;&lt;SPAN&gt;import&lt;/SPAN&gt;&lt;SPAN&gt; Row&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;from&lt;/SPAN&gt;&lt;SPAN&gt; databricks.feature_engineering &lt;/SPAN&gt;&lt;SPAN&gt;import&lt;/SPAN&gt;&lt;SPAN&gt; FeatureEngineeringClient, FeatureLookup&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;feature_table_catalog_path &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;"catalog.schema.table&lt;/SPAN&gt;&lt;SPAN&gt;" #insert your own unity catalog path&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;feature_table_data &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; [&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Row(&lt;/SPAN&gt;&lt;SPAN&gt;id&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;1&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;date&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;dt.datetime(&lt;/SPAN&gt;&lt;SPAN&gt;2024&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;1&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;1&lt;/SPAN&gt;&lt;SPAN&gt;).date(), &lt;/SPAN&gt;&lt;SPAN&gt;value&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;a&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;]&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;feature_table &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; spark.createDataFrame(feature_table_data)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;fe &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; FeatureEngineeringClient()&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;fe.create_table(&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;name&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;feature_table_catalog_path,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;primary_keys&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;[&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;id&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;date&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;],&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;schema&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;feature_table.schema,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;timeseries_columns&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;date&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;fe.write_table(&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;name&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;feature_table_catalog_path,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;df&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;feature_table,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;mode&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;merge&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;dataset_with_target &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; [&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Row(&lt;/SPAN&gt;&lt;SPAN&gt;id&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;1&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;date&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;dt.datetime(&lt;/SPAN&gt;&lt;SPAN&gt;2024&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;1&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;7&lt;/SPAN&gt;&lt;SPAN&gt;), &lt;/SPAN&gt;&lt;SPAN&gt;target&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;1&lt;/SPAN&gt;&lt;SPAN&gt;), &lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;]&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;dataset_with_target &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; spark.createDataFrame(dataset_with_target)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;feature_lookup &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; FeatureLookup(&lt;/SPAN&gt;&lt;SPAN&gt;table_name&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;feature_table_catalog_path, &lt;/SPAN&gt;&lt;SPAN&gt;lookup_key&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;id&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;timestamp_lookup_key&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;date&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;lookback_window&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;dt.timedelta(&lt;/SPAN&gt;&lt;SPAN&gt;days&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;3&lt;/SPAN&gt;&lt;SPAN&gt;))&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;training_dataset &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; fe.create_training_set(&lt;/SPAN&gt;&lt;SPAN&gt;df&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;dataset_with_target, &lt;/SPAN&gt;&lt;SPAN&gt;feature_lookups&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;[feature_lookup], &lt;/SPAN&gt;&lt;SPAN&gt;label&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;target&lt;/SPAN&gt;&lt;SPAN&gt;'&lt;/SPAN&gt;&lt;SPAN&gt;) &lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;training_dataset.load_df().show()&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;```&lt;/P&gt;</description>
      <pubDate>Tue, 06 Aug 2024 10:39:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/feature-store-lookback-window-does-not-work-with-primary-keys-of/m-p/82021#M3548</guid>
      <dc:creator>Kjetil</dc:creator>
      <dc:date>2024-08-06T10:39:18Z</dc:date>
    </item>
    <item>
      <title>Re: Feature Store - lookback_window does not work with primary keys of "date" type</title>
      <link>https://community.databricks.com/t5/machine-learning/feature-store-lookback-window-does-not-work-with-primary-keys-of/m-p/82360#M3556</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/105685"&gt;@Kjetil&lt;/a&gt;, This seems related to how date formats are handled. When you use `.date()`, it strips the time component, which might interfere with lookups.&lt;/P&gt;
&lt;P&gt;To address this, try using the full datetime format without stripping time. Ensure your feature store and Databricks runtime are updated.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 08 Aug 2024 11:00:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/feature-store-lookback-window-does-not-work-with-primary-keys-of/m-p/82360#M3556</guid>
      <dc:creator>Retired_mod</dc:creator>
      <dc:date>2024-08-08T11:00:55Z</dc:date>
    </item>
    <item>
      <title>Re: Feature Store - lookback_window does not work with primary keys of "date" type</title>
      <link>https://community.databricks.com/t5/machine-learning/feature-store-lookback-window-does-not-work-with-primary-keys-of/m-p/82364#M3557</link>
      <description>&lt;P&gt;Thank you for answering. Yes, that is also what I figured out. In other words the lookback_window argument only works when using timestamp format for the primary key. I cannot see that this behavior is described in the documentation.&lt;/P&gt;</description>
      <pubDate>Thu, 08 Aug 2024 11:10:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/feature-store-lookback-window-does-not-work-with-primary-keys-of/m-p/82364#M3557</guid>
      <dc:creator>Kjetil</dc:creator>
      <dc:date>2024-08-08T11:10:18Z</dc:date>
    </item>
  </channel>
</rss>

