Hi,
Context
I'm looking for help trying to get Unity Catalog Feature Lookup to work with my model how I need it to.
I have a trained darts time series model that takes as input to its `.predict()` method both the history of the variable in question, and `n` the number of time steps to forecast ahead from the end of that history.
So if I have a time series of the history of daily widget sales, up to 30th November. And pass `n=7`, I'd get predictions for the period 1st-7th December.
The model is a darts Regression Model that uses some number of the past observations of the target variable as features for predicting future values. Darts handles this feature construction though, you just need to provide it the target variable history in the `predict()` call. (As, at inference time, that history will have evolved since training time, but we still want to use the same trained model, just with the updated history).
Databricks
On Databricks, I have the daily widget sales volumes set up as a timeseries Feature Table. And I want to use the Feature Lookup to populate the required history at inference time, that can then be passed to the model.
I wrapped my trained Darts model into an MLFlow Pyfunc PythonModel, whose `predict()` method similarly takes a Pandas DataFrame containing the target variable history, and passes that to the underlying Darts model. I then logged this model, along with the Feature Table lookup using the Databricks Feature Engineering client. So that the looking up the history could be handled by the Feature Lookup behind-the-scenes at inference time.
The issue I'm having is that, now trying to use this model for inference, this means I have to provide the set of dates I want the feature lookup to run on to the model -- which in this case would be dates in the past. Say, if I'm running inference on 1st December, and I need to pass the dates 1st November-30th November (as the model uses 30 days' past values of widget sales as it features) to the Feature Engineering client's `score_batch()` method.
This will then successfully perform the Feature Table lookup, to get the widget sales volumes for those dates, and pass that to the Darts model. But the predictions produced by the Darts model are for a different set of dates, i.e. 1st-7th December.
So I'm getting the following error when I try this:
pyspark.errors.exceptions.base.PySparkRuntimeError: [RESULT_LENGTH_MISMATCH_FOR_SCALAR_ITER_PANDAS_UDF] The length of output in Scalar iterator pandas UDF should be the same with the input's; however, the length of output was 7 and the length of input was 30.
I guess because I passed 30 rows to the `score_batch()` method (i.e. the dates I needed for the target value history lookup), and only got back 7 rows as predictions (due to the chosen forecast horizon). But Databricks expects this number to be equal, and wants to ascribe each prediction back to a single row of the input to `score_batch()`?
I don't get this issue if I happen to also set the model to only need the same length of history as the desired forecast horizon -- i.e. if the model only uses 7 days' past history to predict 7 days into the future. However, even there there's an issue, since it assigns the 7 predictions to the 7 original input rows, and so the associated dates are incorrect. So the prediction for 1st December is assigned to the input row for 24th November, etc.
But that's not a sustainable solution anyway, since in general the length of history used and forecast horizon will be different.
So is there a way to get the Feature Lookups to do what I need? Which is something like:
* Lookup the widget sales volume history, as-of a particular point in time. (So as-of 1st December, give me all the data up to 30th November inclusive).
* Pass that history to the Darts model's predict method, which will then return projections for the future, from the end date of that history. (I.e. 1st-7th December, if forecast horizon 7).
* Return the predictions, and (if possible) also the history as a separate DataFrame, back to the user when they call `score_batch()`.
I considered trying something like table-valued functions, to look up the history, but as far as I can tell, the Feature Lookup stuff on Unity Catalog is only for scalar valued functions (e.g. the docs here say UDTF's can't be registered on Unity Catalog).