Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-25-2024 06:16 AM
In your scenario, if the data loaded on day 2 also includes all the data from day 1, you can still apply a "remove duplicates" logic. For instance, you could compute a hashdiff by hashing all the columns and use this to exclude rows you've already seen. However, I believe you first need to load all the data, regardless of whether it contains duplicates. Once loaded, you can determine which rows are duplicates. Essentially, you need to examine the data before identifying duplicates.