Databricks Community

pooja_bhumandla · ‎07-25-2025

Hi Databricks Community,

I’m analyzing the performance of Delta Lake MERGE operations on a partitioned table, and I observed unexpected behavior across 3 test cases.

I wanted to share my findings to better understand:

Why ZORDER or Deletion Vectors helped or hurt performance
What factors explain the execution time behavior
What optimizations might improve performance in a partitioned table setup

Table Setup:

Table is partitioned
Merge condition targets only 17 specific rows
All merges are on the same table — only the optimization strategy changes

Case 1: Merge After Optimizing (No ZORDER, No Deletion Vectors)

Metric Value

numTargetRowsCopied 5.9 million

executionTimeMs 194,757 ms

rewriteTimeMs 51,049 ms

scanTimeMs 7,974 ms

numTargetFilesAdded 1

Observation: Reasonable performance but still rewrites a large portion of data even though only 17 rows were matched.

Case 2: Merge After ZORDER on 4 Columns (No Deletion Vectors)
Metric Value
numTargetRowsCopied 30.3 million
executionTimeMs 202,994 ms
rewriteTimeMs 113,906 ms
scanTimeMs 11,715 ms
numTargetFilesAdded 7
Observation: Worse performance than Case 1 — even though ZORDER was applied, more rows were rewritten

Case 3: Merge After ZORDER + Deletion Vectors Enabled
Metric Value
numTargetRowsCopied 15 million
executionTimeMs 264,358 ms
rewriteTimeMs 122,688 ms
scanTimeMs 19,342 ms
numTargetFilesAdded 1
Observation: Despite enabling Deletion Vectors, which should help by avoiding full rewrites, the execution time actually increased.

Why did enabling Deletion Vectors result in higher execution time, even though fewer files were touched and fewer rows were copied than in Case 2?
Why does ZORDERing seem to increase the number of rows copied in Case 2, even though it improves scan locality?
What are the best practices to ensure that ZORDER and Deletion Vectors actually improve performance, especially when updating a very small number of rows?

Would love to hear your insights and suggestions!

Thanks in advance!

radothede · ‎07-31-2025

Hi @pooja_bhumandla

Thanks for such a nice and detailed description of Your case, that really helps to understand the scenario 🙂

Regarding Your questions:

1)

Overall operation could become more complex due to:

a) deletion vector creation and maintenance,

b) metadata operations,

c) any background optimization processes.

2)

ZORDER operation is phisically redistributing Your rows across new files (creating new delta table version) created by OPTIMIZE command (thats why we can see 7 files rewritten in case 2) in a given order - this is why your scan operations are faster (improved reading capabilities on zorder-ed table).

Merge operation had to rewrite more files and more data so it took longer and downgraded overall performance of Your operation (your rows got redistributed across multiple parquet files).

3)

I would recommend performing below checks / steps:

a) ZORDER keys are in line with merge patterns (keys),

b) use smaller target file sizes (using tuneFileSizesForRewrites or maxFileSize) - data layout under your delta table can have huge impact on merge performance (ref to this topic: delta-lake-file-sizes-optimize-maxfilesize)

c) consider running a test with liquid clustering or different partition alignment wich will fit your merge process,

d) test on hive-style partitioned and optimized table with no zorder and different maxFileSize (32 MB, 64MB, 128 MB - depends of your table size) or tuneFileSizesForRewrites set to true,

e) test and adapt what works best for Your specific scenario.

Best,

Radek.

Databricks Community

Performance Behavior of MERGE with Partitioned Table: Impact of ZORDER and Deletion Vectors

Join Us as a Local Community Builder!

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

🌟 Community Pulse: Your Weekly Roundup! November 14 – 20, 2025

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples