cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Does "Merge Into" skip files when reading target table to find files to be touched?

gmiguel
Contributor

I've been doing some testing with Partitions vs Z-Ordering to optimize the merge process.
As the documentation says, tables smaller than 1TB should not be partitioned and can benefit from the Z-Ordering process to optimize the reading process.
Analyzing the Merge process, I identified that even after Z-Ordering, the destination table is always read in full to perform the join with the modified data. This means that, if there is a change in 1 record and the destination table is 100 GB, the merge process will read the 100 GB to identify the files that need to be rewritten, not using the statistics for file skipping.

This behavior seems weird to me, but that's what I figured out analyzing the merge execution plan.

The good and old-fashion partitioning still seems to be more suitable for merge processes.

 

1 ACCEPTED SOLUTION

Accepted Solutions

I've found the answer I was looking for.

https://docs.databricks.com/en/optimizations/dynamic-file-pruning.html

Dynamic File Pruning works only for MERGE, UPDATE and DELETE when Photon is enabled.

Thank you

View solution in original post

2 REPLIES 2

@Retired_mod 

I think you forgot to write down your thoughts...

Going further...

Is there any improvement in the roadmap to speed up the merge? It doesn't make sense having column statistics and still make use of only partition pruning to narrow down the data scanned. That's a lot of wasted computing resources, knowing that file skipping based on delta logs could be used in this matter.

I've found the answer I was looking for.

https://docs.databricks.com/en/optimizations/dynamic-file-pruning.html

Dynamic File Pruning works only for MERGE, UPDATE and DELETE when Photon is enabled.

Thank you

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group