I want to create a DLT pipeline that incrementally processes csv files arriving daily. However, some of those files are duplicate - they have the same names and data but are in different directories. What is the best way to handle this? I'm assuming that row-level deduplication would be inefficient, but not sure if file-level deduplication is possible with DLT streaming.