cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Streamed DLT Pipeline using a lookup table

Mathias_Peters
Contributor II

Hi, 

I need to join three streams/streamed data sets in a DLT pipeline. I am reading from a Kinesis data stream a sequence of events per group key. The logically first of the events per group contains a marker which determines whether that group is relevant for me or not. These events cannot be windowed reasonably since they can be weeks or event years apart. Conceptually: In order to process an event, I have to lookup the first event of the group and check if the marker is set. Later on, I have to join the group key the relevant events with a events of a second and third stream. 

Doing this in DLT streaming tables will prob be too slow, since row based lookups are prob not supported (in my understanding).
Is there a best practice using a fast, maintainable lookup structure like a Postgres table holding just group keys where I can do a lookup per event?

thank you in advance  

 

0 REPLIES 0