Databricks Community

jonathan-dufaul · ‎11-23-2022

I have a pyfunc model that I can use to get predictions. It takes time series data with context information at each date, and produces a string of predictions. For example:

The data is set up like below (temp/pressure/output are different than my input data columns)

date,sales,temperature,pressure,output
01-09-2020,100,101,5000,10
01-10-2020,120,91,4000,24
01-11-2020,50,110,6000,30

let's say the model is trained using a window size of 60 and prediction inverval of 14. then in the model, you provide provide 60 records, and the prediction will get 14 predictions starting the last date+1 in your prediction dataset.

the return is just of the form

date,prediction
01-12-2022,81
01-13-2022,60
01-14-2022,111
...

with N records (in our example 14). it works brilliantly if I augment the data myself and work with the predict function.

Does working with the feature store support this? score batch doesn't seem to be able to return arbitrary/different shaped data. I could try making the data wide, but then that would defeat the purpose of trying to use the feature store.

don't know if I'm making sense.

shan_chandra · ‎04-25-2023

@Jonathan Dufault - When you use the model for inference, you can choose to have it retrieve feature values from Feature Store. could you please try the below time series feature as an example - https://docs.databricks.com/_extras/notebooks/source/machine-learning/feature-store-time-series-exam... and see if it works?

jonathan-dufaul · ‎04-26-2023

I think you might have misinterpreted the question. The question is a technical one about about the score_batch function. Specifically how can I use score batch when the input data size is size K rows and the return is size L rows. (not 1 row = 1 prediction) Did I phrase it confusingly?

The only overlapping part I see on the notebook you provided is that both my data and the notebook uses time series data and that notebook has an example of using score_batch, albeit in the way I already mentioned in the question. I don't even think that model uses lagged data in any form. Am I missing something subtle?

luis_herrera · ‎04-28-2023

Yes, FeatureStoreClient().score_batch supports multidimensional predictions. However, the DataFrame you provide to FeatureStoreClient.score_batch must contain a timestamp column with the same name and DataType as the timestamp_lookup_key of the FeatureLookup provided to FeatureStoreClient.create_training_set.

(Check https://docs.databricks.com/machine-learning/feature-store/time-series.html)

PS: Check #DAIS2023 talks!

jonathan-dufaul · ‎05-01-2023

If I'm reading this right, this is identical to what shan responded above. I have responded to that explaining why this is attempting to answer a different question that is not the one I posed.