Databricks Community

ChristianRRL · ‎11-15-2024

Hi there,

I'm aware that nowadays newer runtimes of Databricks support some great features, including primary and foreign key constraints. I'm wondering, if we have clusters that are running older runtime versions, are there Upserting patterns that have been used before and may be recommended?

For additional context, our clusters are mainly in Runtime 13.3 and we do have one running 14.1.

Walter_C · ‎11-19-2024

For clusters running older Databricks runtime versions, such as 13.3, you can still implement upserting patterns effectively, even though they may not support the latest features like primary and foreign key constraints available in newer runtimes.

One common upserting pattern involves using the MERGE INTO statement, which allows you to merge a source dataset into a target dataset based on a specified condition. This pattern is useful for handling both inserts and updates in a single operation. Here is a basic example of how you can use the MERGE INTO statement:

MERGE INTO target_table AS target
USING source_table AS source
ON target.id = source.id
WHEN MATCHED THEN
  UPDATE SET target.column1 = source.column1, target.column2 = source.column2
WHEN NOT MATCHED THEN
  INSERT (id, column1, column2) VALUES (source.id, source.column1, source.column2)

This approach ensures that if a record with the same id exists in the target table, it will be updated with the values from the source table. If the record does not exist, it will be inserted.

View solution in original post

Walter_C · ‎11-19-2024

For clusters running older Databricks runtime versions, such as 13.3, you can still implement upserting patterns effectively, even though they may not support the latest features like primary and foreign key constraints available in newer runtimes.

One common upserting pattern involves using the MERGE INTO statement, which allows you to merge a source dataset into a target dataset based on a specified condition. This pattern is useful for handling both inserts and updates in a single operation. Here is a basic example of how you can use the MERGE INTO statement:

MERGE INTO target_table AS target
USING source_table AS source
ON target.id = source.id
WHEN MATCHED THEN
  UPDATE SET target.column1 = source.column1, target.column2 = source.column2
WHEN NOT MATCHED THEN
  INSERT (id, column1, column2) VALUES (source.id, source.column1, source.column2)

This approach ensures that if a record with the same id exists in the target table, it will be updated with the values from the source table. If the record does not exist, it will be inserted.

szymon_dybczak · ‎11-19-2024

As @Walter_C mentioned, merge is a a proper way to perform upsert. I just want to add that in Databricks primary and foreign constraints are informational only constraints and thus are not enforced. So, @ChristianRRL be aware of that fact, because I see a lot of folks from RDBMS world that ran into issues, because they assumed that it works like in standard RDBMS engine.

Databricks Community

PKEY Upserting Pattern With Older Runtimes

Join Us as a Local Community Builder!

Free Edition Hackathon

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples

Level Up with Databricks Specialist Sessions

🌟 Community Pulse: Your Weekly Roundup! November 07 – 13, 2025

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐