Hello Databricks Community,
I am seeking guidance on handling full load scenarios with Delta Live Tables (DLT) and Unity Catalog. Here’s the situation I’m dealing with:
We have a data folder in Azure Data Lake Storage (ADLS) where we use Auto Loader to ingest the latest data into a Delta Live Table. The data is loaded as a full load from the source system on a daily basis. However, there are two key challenges:
Record Deletion: Records may be deleted in the source system, but these deletions are not reflected in our DLT tables because Auto Loader only captures new and modified records.
Overwriting Data: Since the source system performs full loads daily and there are no primary columns or unique identifiers in the source data, we need to overwrite the existing data in the DLT table to accurately reflect the current state of the source.
Given that we are using Unity Catalog with our DLT tables, I would like to understand the best practices for implementing a full load strategy that allows us to overwrite the entire dataset in our Delta tables. Specifically, I am looking for guidance on:
- How to effectively overwrite data in a Delta Live Table when the source system performs full loads.
- Strategies to ensure that deleted records from the source system are also removed from the Delta table.
Any insights or examples on how to achieve this would be greatly appreciated.
Thank you!