<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Delta Upsert performance on empty table in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/delta-upsert-performance-on-empty-table/m-p/14047#M8599</link>
    <description>&lt;P&gt;Yes, please share your results on this post so that future users can get the answer too!&lt;/P&gt;</description>
    <pubDate>Tue, 12 Oct 2021 16:31:45 GMT</pubDate>
    <dc:creator>Ryan_Chynoweth</dc:creator>
    <dc:date>2021-10-12T16:31:45Z</dc:date>
    <item>
      <title>Delta Upsert performance on empty table</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-upsert-performance-on-empty-table/m-p/14040#M8592</link>
      <description>&lt;P&gt;Hello all,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I was just wandering, performance wise how does it compare a plain write operation with a merge operation on an &lt;B&gt;EMPTY &lt;/B&gt;delta table. Do we really risk to get significant performance drop?&lt;/P&gt;&lt;P&gt;The use case would be to have the same pipeline for initial and incremental load.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you in advance!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 05 Oct 2021 15:24:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-upsert-performance-on-empty-table/m-p/14040#M8592</guid>
      <dc:creator>pantelis_mare</dc:creator>
      <dc:date>2021-10-05T15:24:18Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Upsert performance on empty table</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-upsert-performance-on-empty-table/m-p/14042#M8594</link>
      <description>&lt;P&gt;I haven't tested this but I think the merge will be slower. A merge will check the merge conditions which will take extra time compared to the blind append.&lt;/P&gt;&lt;P&gt;So unless delta lake has a check built in to check on empty tables, my guess is it will be slower.&lt;/P&gt;&lt;P&gt;By how much? No idea, I don't think it will be by a lot. But a blind append is of course wicked fast.&lt;/P&gt;&lt;P&gt;If you want a single script for init/incremental, you can do a check on a count of records in the target table. If that is 0, pass a different query.&lt;/P&gt;&lt;P&gt;I like to keep them separated though.&lt;/P&gt;</description>
      <pubDate>Wed, 06 Oct 2021 09:03:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-upsert-performance-on-empty-table/m-p/14042#M8594</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2021-10-06T09:03:19Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Upsert performance on empty table</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-upsert-performance-on-empty-table/m-p/14043#M8595</link>
      <description>&lt;P&gt;Hello @Werner Stinckens​&amp;nbsp;!&lt;/P&gt;&lt;P&gt;Thanks for taking the time to answer. The question is rather how much slower would it be. My understanding is that during merge there is a first step of identifying what rows are to be insterted (simple append), thus if this step's duration is minimal (order of seconds) because of having an empty delta table, then it practically does not make any sense to increase our codebase.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If somebody has seen any tests/benchmarks I would appreciate!&lt;/P&gt;</description>
      <pubDate>Fri, 08 Oct 2021 07:49:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-upsert-performance-on-empty-table/m-p/14043#M8595</guid>
      <dc:creator>pantelis_mare</dc:creator>
      <dc:date>2021-10-08T07:49:42Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Upsert performance on empty table</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-upsert-performance-on-empty-table/m-p/14044#M8596</link>
      <description>&lt;P&gt;In case no one has benchmarks/tests, you could try it yourself?  Create an empty delta table and try with merge and append.&lt;/P&gt;&lt;P&gt;My guess is that it will be a matter of seconds, as you already mentioned.&lt;/P&gt;</description>
      <pubDate>Fri, 08 Oct 2021 07:58:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-upsert-performance-on-empty-table/m-p/14044#M8596</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2021-10-08T07:58:14Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Upsert performance on empty table</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-upsert-performance-on-empty-table/m-p/14045#M8597</link>
      <description>&lt;P&gt;I have not seen benchmarks for this but I would imagine that a merge would not be much slower than a pure insert because the merge would quickly identify that all rows would need to be inserted. &lt;/P&gt;</description>
      <pubDate>Mon, 11 Oct 2021 20:24:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-upsert-performance-on-empty-table/m-p/14045#M8597</guid>
      <dc:creator>Ryan_Chynoweth</dc:creator>
      <dc:date>2021-10-11T20:24:39Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Upsert performance on empty table</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-upsert-performance-on-empty-table/m-p/14046#M8598</link>
      <description>&lt;P&gt;Thank you&amp;nbsp;&lt;A href="https://community.databricks.com/s/profile/0053f000000pOXHAA2" alt="https://community.databricks.com/s/profile/0053f000000pOXHAA2" target="_blank"&gt;@Ryan Chynoweth&lt;/A&gt;&amp;nbsp;(Databricks)​&amp;nbsp;. This is what I imagine as well. Will be doing a benchmark in the following days and will post the findings&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 12 Oct 2021 07:49:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-upsert-performance-on-empty-table/m-p/14046#M8598</guid>
      <dc:creator>pantelis_mare</dc:creator>
      <dc:date>2021-10-12T07:49:43Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Upsert performance on empty table</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-upsert-performance-on-empty-table/m-p/14047#M8599</link>
      <description>&lt;P&gt;Yes, please share your results on this post so that future users can get the answer too!&lt;/P&gt;</description>
      <pubDate>Tue, 12 Oct 2021 16:31:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-upsert-performance-on-empty-table/m-p/14047#M8599</guid>
      <dc:creator>Ryan_Chynoweth</dc:creator>
      <dc:date>2021-10-12T16:31:45Z</dc:date>
    </item>
    <item>
      <title>Re: Delta Upsert performance on empty table</title>
      <link>https://community.databricks.com/t5/data-engineering/delta-upsert-performance-on-empty-table/m-p/14049#M8601</link>
      <description>&lt;P&gt;Hello @Kaniz Fatma​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Unfortunately I did not do any further investigation on the subject. Given that the merge on an empty table will only be done once at the creation of a table, it wouldn't really matter to be honest. &lt;/P&gt;</description>
      <pubDate>Thu, 19 May 2022 07:41:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/delta-upsert-performance-on-empty-table/m-p/14049#M8601</guid>
      <dc:creator>pantelis_mare</dc:creator>
      <dc:date>2022-05-19T07:41:29Z</dc:date>
    </item>
  </channel>
</rss>

