cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

how to compute difference over time of a spark structure streaming?

chanansh
Contributor

I have a table with a timestamp column (t) and a list of columns for which I would like to compute the difference over time (v), by some key(k): v_diff(t) = v(t)-v(t-1) for each k independently.

Normally I would write:

lag_window = Window.partitionBy(COLS_TO_DIFF).orderBy('timestamp')
 
for col in COLS_TO_DIFF:
  df = df.withColumn(
    col + "_diff",
    df[col] - F.lag(df[col]).over(lag_window))

But

AnalysisException: Non-time-based windows are not supported on streaming DataFrames/Datasets;

So, my question is how do I compute what I need?

2 REPLIES 2

chanansh
Contributor

I asked it also in Stack-overflow since I don't get an answer here https://stackoverflow.com/questions/75161849/spark-structure-streaming-differentiate-over-time

chanansh
Contributor
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.