cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Does MERGE statement preserve order? (Slowly Changing Dimensions)

TMNGB
New Contributor II

In the case of processing multiple source files - with potentially, one or multiple entity versions per source - being able to use the MERGE statement whilst preserving the order is key to ensure the correct versioning of entity versions (aka, version 1 starts at X to Y, then comes version 2 from Y to Z, etc).

However, as far as I can tell, there is no guarantee that the data will be processed (MERGED) according to the order in the DataFrame. Has anyone confirmed this?

The current way to bypass this is to process each extraction date separately for the MERGE statement however, it is quite a slow process since Azure takes a long time to MERGE.

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

you could guarantee order by actually ordering the dataframe which you want to merge or use a window function (and keeping only the most recent record f.e.).

The upsert is an expensive operation, so depending on the amount of data which has to be evaluated it can take a while indeed.

There are some tweaks possible though:

https://docs.microsoft.com/en-us/azure/databricks/kb/delta/delta-merge-into

View solution in original post

2 REPLIES 2

-werners-
Esteemed Contributor III

you could guarantee order by actually ordering the dataframe which you want to merge or use a window function (and keeping only the most recent record f.e.).

The upsert is an expensive operation, so depending on the amount of data which has to be evaluated it can take a while indeed.

There are some tweaks possible though:

https://docs.microsoft.com/en-us/azure/databricks/kb/delta/delta-merge-into

Noopur_Nigam
Valued Contributor II

Hi @Guilherme Banhudo​ I hope that werners answer would have helped you. Please let me know if you still have doubts or queries.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group