how to compute difference over time of a spark structure streaming?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-18-2023 07:48 AM
I have a table with a timestamp column (t) and a list of columns for which I would like to compute the difference over time (v), by some key(k): v_diff(t) = v(t)-v(t-1) for each k independently.
Normally I would write:
lag_window = Window.partitionBy(COLS_TO_DIFF).orderBy('timestamp')
for col in COLS_TO_DIFF:
df = df.withColumn(
col + "_diff",
df[col] - F.lag(df[col]).over(lag_window))
But
AnalysisException: Non-time-based windows are not supported on streaming DataFrames/Datasets;
So, my question is how do I compute what I need?
- Labels:
-
Column
-
Difference
-
Spark Structure
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-22-2023 03:54 AM
I asked it also in Stack-overflow since I don't get an answer here https://stackoverflow.com/questions/75161849/spark-structure-streaming-differentiate-over-time
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-08-2023 05:32 AM
I found this but could not make it work https://www.databricks.com/blog/2022/10/18/python-arbitrary-stateful-processing-structured-streaming...
![](/skins/images/8C2A30E5B696B676846234E4B14F2C7B/responsive_peak/images/icon_anonymous_message.png)
![](/skins/images/8C2A30E5B696B676846234E4B14F2C7B/responsive_peak/images/icon_anonymous_message.png)