cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Delta Upsert performance on empty table

pantelis_mare
Contributor III

Hello all,

I was just wandering, performance wise how does it compare a plain write operation with a merge operation on an EMPTY delta table. Do we really risk to get significant performance drop?

The use case would be to have the same pipeline for initial and incremental load.

Thank you in advance!

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

I haven't tested this but I think the merge will be slower. A merge will check the merge conditions which will take extra time compared to the blind append.

So unless delta lake has a check built in to check on empty tables, my guess is it will be slower.

By how much? No idea, I don't think it will be by a lot. But a blind append is of course wicked fast.

If you want a single script for init/incremental, you can do a check on a count of records in the target table. If that is 0, pass a different query.

I like to keep them separated though.

View solution in original post

10 REPLIES 10

Kaniz_Fatma
Community Manager
Community Manager

Hi @ pantelis_mare! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else I will follow up with my team and get back to you soon. Thanks.

-werners-
Esteemed Contributor III

I haven't tested this but I think the merge will be slower. A merge will check the merge conditions which will take extra time compared to the blind append.

So unless delta lake has a check built in to check on empty tables, my guess is it will be slower.

By how much? No idea, I don't think it will be by a lot. But a blind append is of course wicked fast.

If you want a single script for init/incremental, you can do a check on a count of records in the target table. If that is 0, pass a different query.

I like to keep them separated though.

pantelis_mare
Contributor III

Hello @Werner Stinckensโ€‹ !

Thanks for taking the time to answer. The question is rather how much slower would it be. My understanding is that during merge there is a first step of identifying what rows are to be insterted (simple append), thus if this step's duration is minimal (order of seconds) because of having an empty delta table, then it practically does not make any sense to increase our codebase.

If somebody has seen any tests/benchmarks I would appreciate!

-werners-
Esteemed Contributor III

In case no one has benchmarks/tests, you could try it yourself? Create an empty delta table and try with merge and append.

My guess is that it will be a matter of seconds, as you already mentioned.

I have not seen benchmarks for this but I would imagine that a merge would not be much slower than a pure insert because the merge would quickly identify that all rows would need to be inserted.

Thank you @Ryan Chynoweth (Databricks)โ€‹ . This is what I imagine as well. Will be doing a benchmark in the following days and will post the findings

Yes, please share your results on this post so that future users can get the answer too!

Kaniz_Fatma
Community Manager
Community Manager

Hi @Pantelis Maroudisโ€‹  , Just a friendly follow-up. Do you still need help, or the above responses help you to find the solution? Please let us know.

pantelis_mare
Contributor III

Hello @Kaniz Fatmaโ€‹ ,

Unfortunately I did not do any further investigation on the subject. Given that the merge on an empty table will only be done once at the creation of a table, it wouldn't really matter to be honest.

Hi @Pantelis Maroudisโ€‹ , Do you want us to resolve this thread in that case?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group