We receive several datasets where the full dump is delivered daily or weekly. What is the best way to ingest this into Databricks using DLT or basic PySpark while adhering to the medallion?
1. If we use AutoLoader into Bronze, We'd end up with incrementing the bronze table with 100,000 rows evey day (with 99% duplicates).
How would we then move changes or additions downstream?