cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

PKEY Upserting Pattern With Older Runtimes

ChristianRRL
Valued Contributor

Hi there,

I'm aware that nowadays newer runtimes of Databricks support some great features, including primary and foreign key constraints. I'm wondering, if we have clusters that are running older runtime versions, are there Upserting patterns that have been used before and may be recommended?

For additional context, our clusters are mainly in Runtime 13.3 and we do have one running 14.1.

1 ACCEPTED SOLUTION

Accepted Solutions

Walter_C
Databricks Employee
Databricks Employee

For clusters running older Databricks runtime versions, such as 13.3, you can still implement upserting patterns effectively, even though they may not support the latest features like primary and foreign key constraints available in newer runtimes.

One common upserting pattern involves using the MERGE INTO statement, which allows you to merge a source dataset into a target dataset based on a specified condition. This pattern is useful for handling both inserts and updates in a single operation. Here is a basic example of how you can use the MERGE INTO statement:

MERGE INTO target_table AS target
USING source_table AS source
ON target.id = source.id
WHEN MATCHED THEN
  UPDATE SET target.column1 = source.column1, target.column2 = source.column2
WHEN NOT MATCHED THEN
  INSERT (id, column1, column2) VALUES (source.id, source.column1, source.column2)

This approach ensures that if a record with the same id exists in the target table, it will be updated with the values from the source table. If the record does not exist, it will be inserted.



View solution in original post

2 REPLIES 2

Walter_C
Databricks Employee
Databricks Employee

For clusters running older Databricks runtime versions, such as 13.3, you can still implement upserting patterns effectively, even though they may not support the latest features like primary and foreign key constraints available in newer runtimes.

One common upserting pattern involves using the MERGE INTO statement, which allows you to merge a source dataset into a target dataset based on a specified condition. This pattern is useful for handling both inserts and updates in a single operation. Here is a basic example of how you can use the MERGE INTO statement:

MERGE INTO target_table AS target
USING source_table AS source
ON target.id = source.id
WHEN MATCHED THEN
  UPDATE SET target.column1 = source.column1, target.column2 = source.column2
WHEN NOT MATCHED THEN
  INSERT (id, column1, column2) VALUES (source.id, source.column1, source.column2)

This approach ensures that if a record with the same id exists in the target table, it will be updated with the values from the source table. If the record does not exist, it will be inserted.



As @Walter_C  mentioned, merge is a a proper way to perform upsert. I just want to add that in Databricks primary and foreign constraints are informational only constraints and thus are not enforced. So, @ChristianRRL be aware of that fact, because I see a lot of folks from RDBMS world that ran into issues, because they assumed that it works like in standard RDBMS engine.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group