โ10-05-2021 08:24 AM
Hello all,
I was just wandering, performance wise how does it compare a plain write operation with a merge operation on an EMPTY delta table. Do we really risk to get significant performance drop?
The use case would be to have the same pipeline for initial and incremental load.
Thank you in advance!
โ10-06-2021 02:03 AM
I haven't tested this but I think the merge will be slower. A merge will check the merge conditions which will take extra time compared to the blind append.
So unless delta lake has a check built in to check on empty tables, my guess is it will be slower.
By how much? No idea, I don't think it will be by a lot. But a blind append is of course wicked fast.
If you want a single script for init/incremental, you can do a check on a count of records in the target table. If that is 0, pass a different query.
I like to keep them separated though.
โ10-06-2021 02:03 AM
I haven't tested this but I think the merge will be slower. A merge will check the merge conditions which will take extra time compared to the blind append.
So unless delta lake has a check built in to check on empty tables, my guess is it will be slower.
By how much? No idea, I don't think it will be by a lot. But a blind append is of course wicked fast.
If you want a single script for init/incremental, you can do a check on a count of records in the target table. If that is 0, pass a different query.
I like to keep them separated though.
โ10-08-2021 12:49 AM
Hello @Werner Stinckensโ !
Thanks for taking the time to answer. The question is rather how much slower would it be. My understanding is that during merge there is a first step of identifying what rows are to be insterted (simple append), thus if this step's duration is minimal (order of seconds) because of having an empty delta table, then it practically does not make any sense to increase our codebase.
If somebody has seen any tests/benchmarks I would appreciate!
โ10-08-2021 12:58 AM
In case no one has benchmarks/tests, you could try it yourself? Create an empty delta table and try with merge and append.
My guess is that it will be a matter of seconds, as you already mentioned.
โ10-11-2021 01:24 PM
I have not seen benchmarks for this but I would imagine that a merge would not be much slower than a pure insert because the merge would quickly identify that all rows would need to be inserted.
โ10-12-2021 12:49 AM
Thank you @Ryan Chynoweth (Databricks)โ . This is what I imagine as well. Will be doing a benchmark in the following days and will post the findings
โ10-12-2021 09:31 AM
Yes, please share your results on this post so that future users can get the answer too!
โ05-19-2022 12:41 AM
Hello @Kaniz Fatmaโ ,
Unfortunately I did not do any further investigation on the subject. Given that the merge on an empty table will only be done once at the creation of a table, it wouldn't really matter to be honest.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group