Dear @Walter_C, thank you for your detailed response regarding watermark handling in Delta Live Tables (DLT). I appreciate the guidance provided, but I would like further clarification on a couple of points related to our use case.
1. Auto-Saving Dropped Records Due to Watermark
We are currently using the APPLY CHANGES API in Delta Live Tables for our pipeline. Is there a built-in mechanism or recommended approach to automatically save records that are dropped due to watermark thresholds? Specifically, we are looking for:
- Step-by-step guidelines or documentation on how to implement this functionality.
- Code examples or configurations that demonstrate how to capture these late records into a separate Delta table for later processing.
If this is not natively supported, could you recommend an alternative approach to achieve this while maintaining pipeline efficiency?
2. Auto-Triggering Updates for Dropped Records
Once the dropped records are saved, we aim to reprocess them and update the final table automatically. Could you provide guidance on:
- How to design a recovery mechanism that triggers updates for these saved records without manual intervention?
- The best practices to ensure these updates do not introduce duplicates or inconsistencies in the final table.
- Any optimizations we can apply to minimize resource usage and processing time during this recovery process.
We would greatly appreciate any additional insights or references to best practices for managing late-arriving data in DLT pipelines.Looking forward to your response!