The following is my scenario:
- I need to query on a daily basis from an external table that maintains a row version
- I would like to be able to query for all records where the row version is greater than the max previously processed row version.
- The source is not a constant streaming source
I've used the delta live tables from a streaming source and am familiar with how they checkpoint from a streaming source such as Kinesis, but much of that is handled internally.
If I need to handle check pointing myself, say to store the last max date processed or the last row version number processed then use that in the subsequent query as the starting point, what is the recommended approach.
I've come across the idea of using a separate view to maintain the max value for a given table where I can query that table to get the starting value for the next incremental batch, but I'm wondering if there are better approaches to this, or if I can somehow utilize the streaming table checkpoint mechanism for this.