hi @mark_ott , thank you for your help.
I have a follow up question regarding data completeness and out of order processing. I have decided to go with the delta table option since super low latency is not an issue and since this option has (seemingly) the lowest maintenance effort.
I am wondering, how to ensure data completeness in the following scenario: I have created the lookup table consent_ids using a dlt pipeline reading from another dlt table consent_source. Now, I want to implement the actual filter logic which reads all the events from consent_source and uses a stream static join with consent_ids.
The events stored in consent_source should be roughly in order, since they come from a Kinesis stream which is sharded by consent_id (meaning the events with the same consent id should be ordered, while events with different consent ids can be out of order w.r.t. event time).
The consent_id stored in the look up table are extracted from the very first event in a sequence of events per event id. How can I ensure, that each consent_id is already stored in the lookup table when it is needed during the stream static join? Since I have to store the final result of my flow in ljson files and not in tables, I assume one coordinated dlt pipeline is not feasible here. My plan is to wrap both parts into a Databricks job.