cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Is it possible to write tables to delta lake using upsert mode? Would it be more efficiant than overwrite?

Direo
Contributor
1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @Direo Direoโ€‹ ,

Answer 1:- You can upsert data from a source table, view, or DataFrame into a target Delta table by using the MERGE SQL operation. Delta Lake supports inserts, updates, and deletes in MERGE, and it supports extended syntax beyond the SQL standards to facilitate advanced use cases.

Suppose you have a source table named people10mupdates or a source path at /tmp/delta/people-10m updates that contain new data for a target table named people10m or a target path at /tmp/delta/people-10m.

Some of these new records may already be present in the target data.

To merge the new data, you want to update rows where the personโ€™s id is already present and insert the new rows where no matching id is present.

You can run the following:

MERGE INTO people10m
USING people10mupdates
ON people10m.id = people10mupdates.id
WHEN MATCHED THEN
  UPDATE SET
    id = people10mupdates.id,
    firstName = people10mupdates.firstName,
    middleName = people10mupdates.middleName,
    lastName = people10mupdates.lastName,
    gender = people10mupdates.gender,
    birthDate = people10mupdates.birthDate,
    ssn = people10mupdates.ssn,
    salary = people10mupdates.salary
WHEN NOT MATCHED
  THEN INSERT (
    id,
    firstName,
    middleName,
    lastName,
    gender,
    birthDate,
    ssn,
    salary
  )
  VALUES (
    people10mupdates.id,
    people10mupdates.firstName,
    people10mupdates.middleName,
    people10mupdates.lastName,
    people10mupdates.gender,
    people10mupdates.birthDate,
    people10mupdates.ssn,
    people10mupdates.salary
  )

For syntax details, see

See the Delta Lake API reference for Scala, Java, and Python syntax details.

Source 

Answer 2:-

The fine-grained update capability in Databricks Delta simplifies how you build your big data pipelines.

You no longer need to write complicated logic to overwrite tables and overcome a lack of snapshot isolation.

With fine-grained updates, your pipelines will also be more efficient since you donโ€™t need to read and overwrite entire tables.

With changing data, another critical capability required is the ability to roll back in case of bad writes.

Databricks Delta also offers rollback capabilities with the time travel feature, so that if you do a bad merge, you can easily rollback.

Read more (Azure | AWS) about the fine-grained updates feature. To see the feature in action, sign up for a free trial of Databricks and try it out.

View solution in original post

3 REPLIES 3

Hubert-Dudek
Esteemed Contributor III

@Direo Direoโ€‹ , Yes, you use Merge syntax for that https://docs.delta.io/latest/delta-update.html.

And is more efficient than overwriting if you want to update only part of the data, but you need to think about the logic of what to update so overwriting is easier to manage.

Kaniz
Community Manager
Community Manager

Hi @Direo Direoโ€‹ ,

Answer 1:- You can upsert data from a source table, view, or DataFrame into a target Delta table by using the MERGE SQL operation. Delta Lake supports inserts, updates, and deletes in MERGE, and it supports extended syntax beyond the SQL standards to facilitate advanced use cases.

Suppose you have a source table named people10mupdates or a source path at /tmp/delta/people-10m updates that contain new data for a target table named people10m or a target path at /tmp/delta/people-10m.

Some of these new records may already be present in the target data.

To merge the new data, you want to update rows where the personโ€™s id is already present and insert the new rows where no matching id is present.

You can run the following:

MERGE INTO people10m
USING people10mupdates
ON people10m.id = people10mupdates.id
WHEN MATCHED THEN
  UPDATE SET
    id = people10mupdates.id,
    firstName = people10mupdates.firstName,
    middleName = people10mupdates.middleName,
    lastName = people10mupdates.lastName,
    gender = people10mupdates.gender,
    birthDate = people10mupdates.birthDate,
    ssn = people10mupdates.ssn,
    salary = people10mupdates.salary
WHEN NOT MATCHED
  THEN INSERT (
    id,
    firstName,
    middleName,
    lastName,
    gender,
    birthDate,
    ssn,
    salary
  )
  VALUES (
    people10mupdates.id,
    people10mupdates.firstName,
    people10mupdates.middleName,
    people10mupdates.lastName,
    people10mupdates.gender,
    people10mupdates.birthDate,
    people10mupdates.ssn,
    people10mupdates.salary
  )

For syntax details, see

See the Delta Lake API reference for Scala, Java, and Python syntax details.

Source 

Answer 2:-

The fine-grained update capability in Databricks Delta simplifies how you build your big data pipelines.

You no longer need to write complicated logic to overwrite tables and overcome a lack of snapshot isolation.

With fine-grained updates, your pipelines will also be more efficient since you donโ€™t need to read and overwrite entire tables.

With changing data, another critical capability required is the ability to roll back in case of bad writes.

Databricks Delta also offers rollback capabilities with the time travel feature, so that if you do a bad merge, you can easily rollback.

Read more (Azure | AWS) about the fine-grained updates feature. To see the feature in action, sign up for a free trial of Databricks and try it out.

Kaniz
Community Manager
Community Manager

Hi @Direo Direoโ€‹ , Does the above suggestion help you? Were you able to write tables to Delta Lake using upsert mode?

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.