Sample Input data set
ClusterId | Event | EventTime |
1212-18-r9u1kzn1 | RUNNING | 2024-02-02T11:38:30.168+00:00 |
1212-18-r9u1kzn1 | TERMINATING | 2024-02-02T13:43:33.933+00:00 |
1212-18-r9u1kzn1 | STARTING | 2024-02-02T15:50:05.174+00:00 |
1212-18-r9u1kzn1 | RUNNING | 2024-02-02T15:54:21.510+00:00 |
1212-18-r9u1kzn1 | RUNNING | 2024-02-02T16:09:20.576+00:00 |
1212-18-r9u1kzn1 | RUNNING | 2024-02-02T16:19:58.744+00:00 |
1212-18-r9u1kzn1 | TERMINATING | 2024-02-02T17:18:33.863+00:00 |
1212-18-r9u1kzn1 | STARTING | 2024-02-02T17:22:38.635+00:00 |
1212-18-r9u1kzn1 | RUNNING | 2024-02-02T17:23:40.781+00:00 |
1212-18-r9u1kzn1 | TERMINATING | 2024-02-02T18:03:33.953+00:00 |
1212-18-r9u1kzn1 | STARTING | 2024-02-02T21:10:21.651+00:00 |
1212-18-r9u1kzn1 | RUNNING | 2024-02-02T21:13:59.842+00:00 |
1212-18-r9u1kzn1 | TERMINATING | 2024-02-02T22:43:34.022+00:00 |
Below is sample expected output. In this RunningEventTime will show the event time corresponding to the previous running event time for the event "TERMINATING". In case "STARTING" event is present then for that event time should be showing in "StartingEventTime" column.
ClusterId | Event | EventTime | RunningEventTime | StartingEventTime |
1212-18-r9u1kzn1 | RUNNING | 2024-02-02T11:38:30.168+00:00 | | |
1212-18-r9u1kzn1 | TERMINATING | 2024-02-02T13:43:33.933+00:00 | 2024-02-02T11:38:30.168+00:00 | |
1212-18-r9u1kzn1 | STARTING | 2024-02-02T15:50:05.174+00:00 | | |
1212-18-r9u1kzn1 | RUNNING | 2024-02-02T15:54:21.510+00:00 | | |
1212-18-r9u1kzn1 | RUNNING | 2024-02-02T16:09:20.576+00:00 | | |
1212-18-r9u1kzn1 | RUNNING | 2024-02-02T16:19:58.744+00:00 | | |
1212-18-r9u1kzn1 | TERMINATING | 2024-02-02T17:18:33.863+00:00 | | 2024-02-02T15:50:05.174+00:00 |
1212-18-r9u1kzn1 | STARTING | 2024-02-02T17:22:38.635+00:00 | | |
1212-18-r9u1kzn1 | RUNNING | 2024-02-02T17:23:40.781+00:00 | | |
1212-18-r9u1kzn1 | TERMINATING | 2024-02-02T18:03:33.953+00:00 | | 2024-02-02T17:22:38.635+00:00 |
1212-18-r9u1kzn1 | STARTING | 2024-02-02T21:10:21.651+00:00 | | |
1212-18-r9u1kzn1 | RUNNING | 2024-02-02T21:13:59.842+00:00 | | |
1212-18-r9u1kzn1 | TERMINATING | 2024-02-02T22:43:34.022+00:00 | | 2024-02-02T21:10:21.651+00:00 |
I tried few option such using self join but that is not ideal when data set is large. Another option i tried is looping but here also same problem. It will not be good for large data sets. I tried windowing function "lag" but could not make it work. Any suggestion or hint would be really helpful.