<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Why Your Delta MERGE is 5x Slower Than an Overwrite (And How to Fix It) in Community Articles</title>
    <link>https://community.databricks.com/t5/community-articles/why-your-delta-merge-is-5x-slower-than-an-overwrite-and-how-to/m-p/158230#M1234</link>
    <description>&lt;P&gt;Hey everyone,&lt;/P&gt;&lt;P&gt;We’ve all been there: a Delta Lake &lt;FONT face="arial black,avant garde" color="#993300"&gt;MERGE&lt;/FONT&gt; job that should take 20 minutes drags on for &lt;STRONG&gt;90 minutes&lt;/STRONG&gt;, while a full overwrite of the same table finishes in under 20. When an overwrite outpaces a selective merge, it's a massive red flag that your pipeline is doing too much heavy lifting under the hood. This usually happens because the engine is scanning unnecessary partitions and opening thousands of small files.&lt;/P&gt;&lt;P&gt;To fix this, you must explicitly force partition pruning by calculating your source date bounds upfront and passing them as deterministic literals directly into your &lt;FONT face="arial black,avant garde" color="#993300"&gt;MERGE &lt;/FONT&gt;condition. Joining on&lt;FONT face="batang,apple gothic" color="#993300"&gt; target.date = source.date&lt;/FONT&gt; isn't enough for the optimizer; adding hardcoded ranges dropped our target scan from 580 partitions to just 31.&lt;/P&gt;&lt;P&gt;Additionally, if your high-cardinality merge keys are scattered randomly across partitions, a single update forces Spark to rewrite hundreds of files. You can combat this I/O overhead by running a targeted &lt;FONT face="arial black,avant garde" color="#993300"&gt;OPTIMIZE&lt;/FONT&gt; with &lt;FONT face="arial black,avant garde" color="#993300"&gt;ZORDER &lt;/FONT&gt;strictly scoped to your active ingestion window.&lt;/P&gt;&lt;P&gt;If your SLAs are slipping, immediately check your &lt;FONT color="#993300"&gt;numTargetFilesScanned metric &lt;/FONT&gt;via table history and look for small average file sizes using &lt;FONT face="arial black,avant garde" color="#993300"&gt;DESCRIBE DETAIL&lt;/FONT&gt;. I published a full architectural deep dive on how to read these Spark UI metrics and why this occurs over on Medium: &lt;A class="" href="https://medium.com/@avinash.narala6814/databricks-merge-was-5x-slower-than-an-overwrite-the-hidden-mistake-that-was-killing-our-sla-179585e8d57a" target="_self"&gt;Databricks MERGE Was 5x Slower Than an Overwrite — The Hidden Mistake That Was Killing Our SLA&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;What strategies are you all using to keep your Delta merges selective as your datasets scale past hundreds of millions of rows? Let's discuss below!&lt;/P&gt;</description>
    <pubDate>Wed, 03 Jun 2026 16:30:12 GMT</pubDate>
    <dc:creator>Avinash_Narala</dc:creator>
    <dc:date>2026-06-03T16:30:12Z</dc:date>
    <item>
      <title>Why Your Delta MERGE is 5x Slower Than an Overwrite (And How to Fix It)</title>
      <link>https://community.databricks.com/t5/community-articles/why-your-delta-merge-is-5x-slower-than-an-overwrite-and-how-to/m-p/158230#M1234</link>
      <description>&lt;P&gt;Hey everyone,&lt;/P&gt;&lt;P&gt;We’ve all been there: a Delta Lake &lt;FONT face="arial black,avant garde" color="#993300"&gt;MERGE&lt;/FONT&gt; job that should take 20 minutes drags on for &lt;STRONG&gt;90 minutes&lt;/STRONG&gt;, while a full overwrite of the same table finishes in under 20. When an overwrite outpaces a selective merge, it's a massive red flag that your pipeline is doing too much heavy lifting under the hood. This usually happens because the engine is scanning unnecessary partitions and opening thousands of small files.&lt;/P&gt;&lt;P&gt;To fix this, you must explicitly force partition pruning by calculating your source date bounds upfront and passing them as deterministic literals directly into your &lt;FONT face="arial black,avant garde" color="#993300"&gt;MERGE &lt;/FONT&gt;condition. Joining on&lt;FONT face="batang,apple gothic" color="#993300"&gt; target.date = source.date&lt;/FONT&gt; isn't enough for the optimizer; adding hardcoded ranges dropped our target scan from 580 partitions to just 31.&lt;/P&gt;&lt;P&gt;Additionally, if your high-cardinality merge keys are scattered randomly across partitions, a single update forces Spark to rewrite hundreds of files. You can combat this I/O overhead by running a targeted &lt;FONT face="arial black,avant garde" color="#993300"&gt;OPTIMIZE&lt;/FONT&gt; with &lt;FONT face="arial black,avant garde" color="#993300"&gt;ZORDER &lt;/FONT&gt;strictly scoped to your active ingestion window.&lt;/P&gt;&lt;P&gt;If your SLAs are slipping, immediately check your &lt;FONT color="#993300"&gt;numTargetFilesScanned metric &lt;/FONT&gt;via table history and look for small average file sizes using &lt;FONT face="arial black,avant garde" color="#993300"&gt;DESCRIBE DETAIL&lt;/FONT&gt;. I published a full architectural deep dive on how to read these Spark UI metrics and why this occurs over on Medium: &lt;A class="" href="https://medium.com/@avinash.narala6814/databricks-merge-was-5x-slower-than-an-overwrite-the-hidden-mistake-that-was-killing-our-sla-179585e8d57a" target="_self"&gt;Databricks MERGE Was 5x Slower Than an Overwrite — The Hidden Mistake That Was Killing Our SLA&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;What strategies are you all using to keep your Delta merges selective as your datasets scale past hundreds of millions of rows? Let's discuss below!&lt;/P&gt;</description>
      <pubDate>Wed, 03 Jun 2026 16:30:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/community-articles/why-your-delta-merge-is-5x-slower-than-an-overwrite-and-how-to/m-p/158230#M1234</guid>
      <dc:creator>Avinash_Narala</dc:creator>
      <dc:date>2026-06-03T16:30:12Z</dc:date>
    </item>
  </channel>
</rss>

