Here an example for a customer table.
The source_table contains new or updated customer data, and the target_table is the Delta table that maintains historical records.
Table Structures
source_table: contains the latest customer data.
customer_id: Unique identifier for the customer.
name: Customer's name.
address: Customer's address.
email: Customer's email.
phone: Customer's phone number.
target_table: contains the historical customer data.
customer_id: Unique identifier for the customer.
name: Customer's name.
address: Customer's address.
email: Customer's email.
phone: Customer's phone number.
valid_from: Date when the record became effective.
valid_to: Date until the record is effective.
is_current: Flag indicating the current active record.
hash_value: Hash of the attributes to detect changes.
WITH source_with_hash AS (
SELECT
customer_id,
name,
address,
email,
phone,
md5(concat_ws('|', name, address, email, phone)) AS hash_value
FROM source_table
)
MERGE INTO target_table AS target
USING source_with_hash AS source
ON target.customer_id = source.customer_id
AND target.is_current = true
WHEN MATCHED AND target.hash_value != source.hash_value THEN
UPDATE SET
target.valid_to = current_date - 1,
target.is_current = false
WHEN NOT MATCHED BY TARGET THEN
INSERT (customer_id, name, address, email, phone, valid_from, valid_to, is_current, hash_value)
VALUES (source.customer_id, source.name, source.address, source.email, source.phone, current_date, '9999-12-31', true, source.hash_value)
WHEN NOT MATCHED BY SOURCE AND target.is_current = true THEN
UPDATE SET
target.valid_to = current_date - 1,
target.is_current = false;
Here the explanation about all parts of the sentence.
WITH Clause:
Creates a subquery source_with_hash that adds a hash_value column to the source_table. This column contains an MD5 hash of the relevant attributes to detect changes.
MATCHED Clause:
Handles updates where there are changes in the source data (source.hash_value is different from target.hash_value).
Updates the valid_to date of the current record in the target table to the previous day and sets is_current to false.
NOT MATCHED BY TARGET Clause:
Inserts new records that do not exist in the target table.
Inserts the new records with valid_from set to the current date, valid_to set to '9999-12-31', and is_current set to true.
NOT MATCHED BY SOURCE Clause:
Handles records that are in the target table but not in the source table (optional, if you want to handle deletions).
Updates the valid_to date to the previous day and sets is_current to false.
You only have to adjust the column names and logic according to your specific schema and requirements.
I hope it helps you.