cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Delta Upsert performance on empty table

pantelis_mare
Contributor III

Hello all,

I was just wandering, performance wise how does it compare a plain write operation with a merge operation on an EMPTY delta table. Do we really risk to get significant performance drop?

The use case would be to have the same pipeline for initial and incremental load.

Thank you in advance!

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

I haven't tested this but I think the merge will be slower. A merge will check the merge conditions which will take extra time compared to the blind append.

So unless delta lake has a check built in to check on empty tables, my guess is it will be slower.

By how much? No idea, I don't think it will be by a lot. But a blind append is of course wicked fast.

If you want a single script for init/incremental, you can do a check on a count of records in the target table. If that is 0, pass a different query.

I like to keep them separated though.

View solution in original post

10 REPLIES 10

Kaniz
Community Manager
Community Manager

Hi @ pantelis_mare! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else I will follow up with my team and get back to you soon. Thanks.

-werners-
Esteemed Contributor III

I haven't tested this but I think the merge will be slower. A merge will check the merge conditions which will take extra time compared to the blind append.

So unless delta lake has a check built in to check on empty tables, my guess is it will be slower.

By how much? No idea, I don't think it will be by a lot. But a blind append is of course wicked fast.

If you want a single script for init/incremental, you can do a check on a count of records in the target table. If that is 0, pass a different query.

I like to keep them separated though.

pantelis_mare
Contributor III

Hello @Werner Stinckens​ !

Thanks for taking the time to answer. The question is rather how much slower would it be. My understanding is that during merge there is a first step of identifying what rows are to be insterted (simple append), thus if this step's duration is minimal (order of seconds) because of having an empty delta table, then it practically does not make any sense to increase our codebase.

If somebody has seen any tests/benchmarks I would appreciate!

-werners-
Esteemed Contributor III

In case no one has benchmarks/tests, you could try it yourself? Create an empty delta table and try with merge and append.

My guess is that it will be a matter of seconds, as you already mentioned.

Ryan_Chynoweth
Honored Contributor III

I have not seen benchmarks for this but I would imagine that a merge would not be much slower than a pure insert because the merge would quickly identify that all rows would need to be inserted.

Thank you @Ryan Chynoweth (Databricks)​ . This is what I imagine as well. Will be doing a benchmark in the following days and will post the findings

Ryan_Chynoweth
Honored Contributor III

Yes, please share your results on this post so that future users can get the answer too!

Kaniz
Community Manager
Community Manager

Hi @Pantelis Maroudis​  , Just a friendly follow-up. Do you still need help, or the above responses help you to find the solution? Please let us know.

pantelis_mare
Contributor III

Hello @Kaniz Fatma​ ,

Unfortunately I did not do any further investigation on the subject. Given that the merge on an empty table will only be done once at the creation of a table, it wouldn't really matter to be honest.

Hi @Pantelis Maroudis​ , Do you want us to resolve this thread in that case?

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.