Hi,
I need to join three streams/streamed data sets in a DLT pipeline. I am reading from a Kinesis data stream a sequence of events per group key. The logically first of the events per group contains a marker which determines whether that group is relevant for me or not. These events cannot be windowed reasonably since they can be weeks or event years apart. Conceptually: In order to process an event, I have to lookup the first event of the group and check if the marker is set. Later on, I have to join the group key the relevant events with a events of a second and third stream.
Doing this in DLT streaming tables will prob be too slow, since row based lookups are prob not supported (in my understanding).
Is there a best practice using a fast, maintainable lookup structure like a Postgres table holding just group keys where I can do a lookup per event?
thank you in advance