Hi @Direo Direo ,
Answer 1:- You can upsert data from a source table, view, or DataFrame into a target Delta table by using the MERGE SQL operation. Delta Lake supports inserts, updates, and deletes in MERGE, and it supports extended syntax beyond the SQL standards to facilitate advanced use cases.
Suppose you have a source table named people10mupdates or a source path at /tmp/delta/people-10m updates that contain new data for a target table named people10m or a target path at /tmp/delta/people-10m.
Some of these new records may already be present in the target data.
To merge the new data, you want to update rows where the person’s id is already present and insert the new rows where no matching id is present.
You can run the following:
MERGE INTO people10m
USING people10mupdates
ON people10m.id = people10mupdates.id
WHEN MATCHED THEN
UPDATE SET
id = people10mupdates.id,
firstName = people10mupdates.firstName,
middleName = people10mupdates.middleName,
lastName = people10mupdates.lastName,
gender = people10mupdates.gender,
birthDate = people10mupdates.birthDate,
ssn = people10mupdates.ssn,
salary = people10mupdates.salary
WHEN NOT MATCHED
THEN INSERT (
id,
firstName,
middleName,
lastName,
gender,
birthDate,
ssn,
salary
)
VALUES (
people10mupdates.id,
people10mupdates.firstName,
people10mupdates.middleName,
people10mupdates.lastName,
people10mupdates.gender,
people10mupdates.birthDate,
people10mupdates.ssn,
people10mupdates.salary
)
For syntax details, see
See the Delta Lake API reference for Scala, Java, and Python syntax details.
Source
Answer 2:-
The fine-grained update capability in Databricks Delta simplifies how you build your big data pipelines.
You no longer need to write complicated logic to overwrite tables and overcome a lack of snapshot isolation.
With fine-grained updates, your pipelines will also be more efficient since you don’t need to read and overwrite entire tables.
With changing data, another critical capability required is the ability to roll back in case of bad writes.
Databricks Delta also offers rollback capabilities with the time travel feature, so that if you do a bad merge, you can easily rollback.
Read more (Azure | AWS) about the fine-grained updates feature. To see the feature in action, sign up for a free trial of Databricks and try it out.