- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-14-2022 06:43 AM
A way to track INSERT/UPDATE/DELETE would be to split out your upsert process as a 3 part process.
Step1: Include an update timestamp field in your target table.
Step2: In your incoming refresh dataset, add a update timestamp column and set it to current timestamp.
UPDATE count
Step3: Run upsert job with just whenMatchedUpdate clause. Count rows that have current timestamp for update.
Also, get an extract of the rows with current timestamp - these are the rows that you'll UPDATE in DyanmoDB
INSERT count
Step5: Get the row count of table - RC1.
Step6: Run upsert job with just whenNotMatchedInsert clause. Get row count of table after insert - RC2
INSERT count = RC2-RC1
Get an extract of the rows with current timestamp and perform a left-anti join with UPDATE df - these are the rows that you'll INSERT in DyanmoDB
DELETE count
Step7: Get the row count of table - RC1.
Step8: Run upsert job with just whenNotMatchedDelete clause. Get row count of table after insert - RC2
DELETE count = RC1-RC2
From your source data, there would be a way to identify the records that are to be deleted. Grab those indices/primary keys to delete records.