I have a table with a timestamp column (t) and a list of columns for which I would like to compute the difference over time (v), by some key(k): v_diff(t) = v(t)-v(t-1) for each k independently.
Normally I would write:
lag_window = Window.partitionBy(COLS_TO_DIFF).orderBy('timestamp')
for col in COLS_TO_DIFF:
df = df.withColumn(
col + "_diff",
df[col] - F.lag(df[col]).over(lag_window))
But
AnalysisException: Non-time-based windows are not supported on streaming DataFrames/Datasets;
So, my question is how do I compute what I need?