For faster Vacuum run performance,
(1) avoid over-partitioned directories
(2) avoid concurrent runs (during vacuum command run)
(3) avoid enabling S3 versioning (As delta lake itself maintains the history)
(4) run periodic โoptimizeโ command,
(5) enable autoCompaction/autoOptimize on the delta table
(6) use latest/higher DBR with auto-scaling cluster (for faster listing) with compute optimized instance types.
Also, currently the default checkpointInterval is 100, but if you are on a lower DBR it would be 10, you can alter this property to 100 for checkpoint files to be created every 100 commits.
- Since Vacuum is compute intensive , use compute optimized instance types like C5 series instances (for AWS)