Re: Overwriting a delta table using DLT

giuseppegrieco · ‎06-25-2024

In your scenario, if the data loaded on day 2 also includes all the data from day 1, you can still apply a "remove duplicates" logic. For instance, you could compute a hashdiff by hashing all the columns and use this to exclude rows you've already seen. However, I believe you first need to load all the data, regardless of whether it contains duplicates. Once loaded, you can determine which rows are duplicates. Essentially, you need to examine the data before identifying duplicates.