cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Does MERGE statement preserve order? (Slowly Changing Dimensions)

TMNGB
New Contributor II

In the case of processing multiple source files - with potentially, one or multiple entity versions per source - being able to use the MERGE statement whilst preserving the order is key to ensure the correct versioning of entity versions (aka, version 1 starts at X to Y, then comes version 2 from Y to Z, etc).

However, as far as I can tell, there is no guarantee that the data will be processed (MERGED) according to the order in the DataFrame. Has anyone confirmed this?

The current way to bypass this is to process each extraction date separately for the MERGE statement however, it is quite a slow process since Azure takes a long time to MERGE.

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

you could guarantee order by actually ordering the dataframe which you want to merge or use a window function (and keeping only the most recent record f.e.).

The upsert is an expensive operation, so depending on the amount of data which has to be evaluated it can take a while indeed.

There are some tweaks possible though:

https://docs.microsoft.com/en-us/azure/databricks/kb/delta/delta-merge-into

View solution in original post

2 REPLIES 2

-werners-
Esteemed Contributor III

you could guarantee order by actually ordering the dataframe which you want to merge or use a window function (and keeping only the most recent record f.e.).

The upsert is an expensive operation, so depending on the amount of data which has to be evaluated it can take a while indeed.

There are some tweaks possible though:

https://docs.microsoft.com/en-us/azure/databricks/kb/delta/delta-merge-into

Noopur_Nigam
Valued Contributor II
Valued Contributor II

Hi @Guilherme Banhudo​ I hope that werners answer would have helped you. Please let me know if you still have doubts or queries.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.