For clusters running older Databricks runtime versions, such as 13.3, you can still implement upserting patterns effectively, even though they may not support the latest features like primary and foreign key constraints available in newer runtimes.
One common upserting pattern involves using the MERGE INTO
statement, which allows you to merge a source dataset into a target dataset based on a specified condition. This pattern is useful for handling both inserts and updates in a single operation. Here is a basic example of how you can use the MERGE INTO
statement:
MERGE INTO target_table AS target
USING source_table AS source
ON target.id = source.id
WHEN MATCHED THEN
UPDATE SET target.column1 = source.column1, target.column2 = source.column2
WHEN NOT MATCHED THEN
INSERT (id, column1, column2) VALUES (source.id, source.column1, source.column2)
This approach ensures that if a record with the same id
exists in the target table, it will be updated with the values from the source table. If the record does not exist, it will be inserted.