cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
cancel
Showing results for 
Search instead for 
Did you mean: 

Does FeatureStoreClient().score_batch support multidimentional predictions?

jonathan-dufaul
Valued Contributor

I have a pyfunc model that I can use to get predictions. It takes time series data with context information at each date, and produces a string of predictions. For example:

The data is set up like below (temp/pressure/output are different than my input data columns)

date,sales,temperature,pressure,output
01-09-2020,100,101,5000,10
01-10-2020,120,91,4000,24
01-11-2020,50,110,6000,30

let's say the model is trained using a window size of 60 and prediction inverval of 14. then in the model, you provide provide 60 records, and the prediction will get 14 predictions starting the last date+1 in your prediction dataset.

the return is just of the form

date,prediction
01-12-2022,81
01-13-2022,60
01-14-2022,111
...

with N records (in our example 14). it works brilliantly if I augment the data myself and work with the predict function.

Does working with the feature store support this? score batch doesn't seem to be able to return arbitrary/different shaped data. I could try making the data wide, but then that would defeat the purpose of trying to use the feature store.

don't know if I'm making sense.

5 REPLIES 5

shan_chandra
Honored Contributor III
Honored Contributor III

@Jonathan Dufault​ - When you use the model for inference, you can choose to have it retrieve feature values from Feature Store. could you please try the below time series feature as an example - https://docs.databricks.com/_extras/notebooks/source/machine-learning/feature-store-time-series-exam... and see if it works?

I think you might have misinterpreted the question. The question is a technical one about about the score_batch function. Specifically how can I use score batch when the input data size is size K rows and the return is size L rows. (not 1 row = 1 prediction) Did I phrase it confusingly?

The only overlapping part I see on the notebook you provided is that both my data and the notebook uses time series data and that notebook has an example of using score_batch, albeit in the way I already mentioned in the question. I don't even think that model uses lagged data in any form. Am I missing something subtle?

luis_herrera
New Contributor III
New Contributor III

Yes,  FeatureStoreClient().score_batch supports multidimensional predictions. However, the DataFrame you provide to FeatureStoreClient.score_batch must contain a timestamp column with the same name and DataType as the timestamp_lookup_key of the FeatureLookup provided to FeatureStoreClient.create_training_set.

(Check https://docs.databricks.com/machine-learning/feature-store/time-series.html)

PS: Check #DAIS2023 talks!

If I'm reading this right, this is identical to what shan responded above. I have responded to that explaining why this is attempting to answer a different question that is not the one I posed.

EmilAndersson
New Contributor II

I have the same question. I've decided to look for alternative Feature Stores as this makes it very difficult to use for time series forecasting.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.