01-06-2023 09:14 AM
I'm using databricks feature store == 0.6.1.
After I register my feature table with `create_feature_table` and write data with `write_Table` I want to read that feature_table based on filter conditions ( may be on time stamp column ) without calling `create_training_set` would like to this for both training and batch inference.
I found `read_table` function to accomplish this, but not sure how to provide filter conditions in its function call.
Ideally, I'd also like to read a single feature row from online store as well, by passing some entity keys; I couldn't find any documentation for reads from offline and online store, related to my use case.
Any help is much appreciated. Thanks.
01-06-2023 02:54 PM
create_training_set is just a simple Select from delta tables. All feature tables are just registered delta tables. Here is an example code that I used to handle that:
customer_features_df = spark.sql("SELECT * FROM recommender_system.customer_features")
product_features_df = spark.sql("SELECT * FROM recommender_system.product_features")
training_df.join(
customer_features_df,
on=[training_df.cid == customer_features_df.customer_id,
training_df.transaction_dt == customer_features_df.dt],
how="inner"
).join(
product_features_df,
on="product_id",
how="inner"
)
01-06-2023 03:03 PM
Thanks Hubert. So you mean to say, if I want to read a feature table separately , I just do regular select sql statement on that feature table as if a normal delta table ?
`read_table` is not needed in this case ?
01-07-2023 04:15 AM
yes
01-19-2023 05:27 PM
Along similar lines, I'm struggling to understand one concept on feature tables here.
If I can read a feature table directly through sql logic and filter it to the dates of my choice, then how's data bricks feature store different from a "data mart " which is in time separated way ?
Similarly, with feature versioning , every time I want to read a different set of features from offline store, I just pass different column names. How's that different from a regular "select" statement in SQL and data frame ?
I'm struggling to justify value of using data bricks feature store to my team, when they say , "its another data mart ". I have intuition that it's not, but can't give proper reasoning.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group