Delta Lake completes a MERGE in two steps
- Perform an inner join between the target table and source table to select all files that have matches.
- Perform an outer join between the selected files in the target and source tables and write out the updated/deleted/inserted data.
If finding the files that Delta Lake needs to rewrite is taking too long, try:
Add more predicates to narrow down the search space.
- Adjust shuffle partitions.
- Adjust broadcast join thresholds.
- Right-size the files ( balance between too many small files vs few large files )
If rewriting the actual files itself is taking too long, try:
- Adjust shuffle partitions / AQE
- Enable Optimized writes
- Adjust broadcast thresholds.