cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
cancel
Showing results for 
Search instead for 
Did you mean: 

Feature Store : for sklearn flavored models, are timestamps fully supported?

thib
New Contributor III

I have created a feature table (Databricks runtime ML 10.2) that includes a timestamp column as a primary key, that is not used as a feature but as a column to join on.

I have then created a model that trains from this feature table and some additional data, which excludes the primary keys. I tried excluding them, both using the feature store api, and from the sklearn api. The model is being trained fine, but when use the score_batch() method, I get an error claiming that 'TypeError: float() argument must be a string or a number, not 'Timestamp''.

This error is coming from sklearn, so is there some incompatibility there, or is this a bug in feature store?

Steps to reproduce :

  • create feature table with one column as timestamp type
  • train a model using sklearn that does not use that timestamp column
  • use score_batch() method and visualize results
1 ACCEPTED SOLUTION

Accepted Solutions

Hubert-Dudek
Esteemed Contributor III

maybe you can just try to cast timestamp as int

View solution in original post

6 REPLIES 6

Hubert-Dudek
Esteemed Contributor III

maybe you can just try to cast timestamp as int

thib
New Contributor III

Thanks for your reply Hubert. Yes, casting it to long or int does solve the issue, but it is a workaround and I would like to keep the data as-is, with directly interpretable timestamps, especially when there is no reason why they should trigger an error during the prediction step since it is not being used at that stage.

Kaniz
Community Manager
Community Manager

Hi @Thibault Daoulas​ , Databricks released runtime ML 10.2 in December 2021. Here are the important improvisations. You can also refer to the documentation here.

Databricks Runtime ML includes AutoML, a tool to automatically train machine learning pipelines.

  • AutoML ignores columns that have only a single value.
  • For classification and regression problems, the time column used to split the dataset into training, validation, and test sets chronologically can now be string type. Previously only timestamp and integer were supported. See Control the train/validation/test split for details.

The FeatureStoreClient interface has been simplified.

  • FeatureStoreClient.create_feature_table()  has been deprecated. Instead, use FeatureStoreClient.create_table()
  • FeatureStoreClient.get_feature_table() has been deprecated. Instead, use FeatureStoreClient.get_table()
  • All arguments to FeatureStoreClient.publish_table() other than name and online_store must be passed as keyword arguments.

For more information, see Work with feature tables and Databricks Feature Store Python API.

Hi @Thibault Daoulas​ ,

Did @Kaniz Fatma​ response help you to resolved your question? if yes, please mark it as best response. If not, please let us know.

thib
New Contributor III

Hi, it did not, but at least I know they are not fully supported so a workaround is to avoid timestamps, so I suppose you can mark this as resolved

Kaniz
Community Manager
Community Manager

Thank you @Thibault Daoulas​  for the update. Can you mark one of the answers whichever you feel is the best?

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.