AmanSehgal
Honored Contributor III

A way to track INSERT/UPDATE/DELETE would be to split out your upsert process as a 3 part process.

Step1: Include an update timestamp field in your target table.

Step2: In your incoming refresh dataset, add a update timestamp column and set it to current timestamp.

UPDATE count

Step3: Run upsert job with just whenMatchedUpdate clause. Count rows that have current timestamp for update.

Also, get an extract of the rows with current timestamp - these are the rows that you'll UPDATE in DyanmoDB

INSERT count

Step5: Get the row count of table - RC1.

Step6: Run upsert job with just whenNotMatchedInsert clause. Get row count of table after insert - RC2

INSERT count = RC2-RC1

Get an extract of the rows with current timestamp and perform a left-anti join with UPDATE df - these are the rows that you'll INSERT in DyanmoDB

DELETE count

Step7: Get the row count of table - RC1.

Step8: Run upsert job with just whenNotMatchedDelete clause. Get row count of table after insert - RC2

DELETE count = RC1-RC2

From your source data, there would be a way to identify the records that are to be deleted. Grab those indices/primary keys to delete records.