Hi @Nathant93, Certainly! Handling record changes after a MERGE or INSERT operation can be crucial for tracking and auditing.
Let’s explore a few approaches:
Timestamp Approach:
- As you mentioned, adding a timestamp column to your records is a straightforward way to track changes. However, it does require modifying your existing tables.
- If you choose this approach, you can select the records based on the timestamp column to identify the newly inserted or updated rows.
Change Data Feed (Delta Table):
- You can enable the Change Data Feed feature if you’re using a Delta table (available in Databricks or Delta Lake).
- By setting delta.enableChangeDataFeed = true, you’ll receive information about rows that have changed (inserted, updated, or deleted) in the table.
- You can then process this feed using another stream or batch job to track the changes.
Row Count Tracking:
- If you specifically need row counts for different actions (inserts, updates, and new inserts), consider the following approach:
- After your MERGE statement, insert the appropriate rows into a temporary table (similar to your existing process).
- Use the $action column (which indicates the type of action: INSERT, UPDATE, or DELETE) to tally the contents of the temporary table by operation type.
- This way, you’ll have separate counts for each action (inserts, updates, and new inserts).
Remember to adapt this example to your specific schema and requirements. The key is using the $action column and a temporary table to track the actions performed during the MERGE operation.
Feel free to adjust the approach based on your environment and constraints. 😊