cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Performance Behavior of MERGE with Partitioned Table: Impact of ZORDER and Deletion Vectors

pooja_bhumandla
New Contributor II

Hi Databricks Community,

Iโ€™m analyzing the performance of Delta Lake MERGE operations on a partitioned table, and I observed unexpected behavior across 3 test cases.

I wanted to share my findings to better understand:

Why ZORDER or Deletion Vectors helped or hurt performance
What factors explain the execution time behavior
What optimizations might improve performance in a partitioned table setup

Table Setup:
  • Table is partitioned
  • Merge condition targets only 17 specific rows
  • All merges are on the same table โ€” only the optimization strategy changes
Case 1: Merge After Optimizing (No ZORDER, No Deletion Vectors)
Metric Value
numTargetRowsCopied 5.9 million
executionTimeMs 194,757 ms
rewriteTimeMs 51,049 ms
scanTimeMs 7,974 ms
numTargetFilesAdded 1
Observation: Reasonable performance but still rewrites a large portion of data even though only 17 rows were matched.

Case 2: Merge After ZORDER on 4 Columns (No Deletion Vectors)
Metric Value
numTargetRowsCopied 30.3 million
executionTimeMs 202,994 ms
rewriteTimeMs 113,906 ms
scanTimeMs 11,715 ms
numTargetFilesAdded 7
Observation: Worse performance than Case 1 โ€” even though ZORDER was applied, more rows were rewritten
 
Case 3: Merge After ZORDER + Deletion Vectors Enabled
Metric Value
numTargetRowsCopied 15 million
executionTimeMs 264,358 ms
rewriteTimeMs 122,688 ms
scanTimeMs 19,342 ms
numTargetFilesAdded 1
Observation: Despite enabling Deletion Vectors, which should help by avoiding full rewrites, the execution time actually increased.

  1. Why did enabling Deletion Vectors result in higher execution time, even though fewer files were touched and fewer rows were copied than in Case 2?
  2. Why does ZORDERing seem to increase the number of rows copied in Case 2, even though it improves scan locality?
  3. What are the best practices to ensure that ZORDER and Deletion Vectors actually improve performance, especially when updating a very small number of rows?

Would love to hear your insights and suggestions!

Thanks in advance!

 

1 REPLY 1

radothede
Valued Contributor II

Hi @pooja_bhumandla 

Thanks for such a nice and detailed description of Your case, that really helps to understand the scenario ๐Ÿ™‚

Regarding Your questions:

1)

Overall operation could become more complex due to:

a) deletion vector creation and maintenance,

b) metadata operations,

c) any background optimization processes.

 

2)

ZORDER operation is phisically redistributing Your rows across new files (creating new delta table version) created by OPTIMIZE command (thats why we can see 7 files rewritten in case 2) in a given order - this is why your scan operations are faster (improved reading capabilities on zorder-ed table).

Merge operation had to rewrite more files and more data so it took longer and downgraded overall performance of Your operation (your rows got redistributed across multiple parquet files).

 

3)

I would recommend performing below checks / steps:

a) ZORDER keys are in line with merge patterns (keys),

b) use smaller target file sizes (using tuneFileSizesForRewrites or maxFileSize) - data layout under your delta table can have huge impact on merge performance (ref to this topic: delta-lake-file-sizes-optimize-maxfilesize)

c) consider running a test with liquid clustering or different partition alignment wich will fit your merge process,

d) test on hive-style partitioned and optimized table with no zorder and different maxFileSize (32 MB, 64MB, 128 MB - depends of your table size) or tuneFileSizesForRewrites set to true,

e) test and adapt what works best for Your specific scenario.

Best,

Radek.

 

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now