How does running VACUUM on Delta Lake tables effect read/write performance?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-10-2021 02:47 PM
If I don't run VACUUM on a Delta Lake table, will that make my read performance slower?
- Labels:
-
Read data
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-17-2021 03:39 PM
VACUUM does not have a direct impact on read/write performance since it only remove files no longer referenced by a Delta table ( unless your data volume is so high that you are hitting the read limits of underlying S3/GCS/ADLS buckets ) . It would make sense to run it as a separate job scheduled daily and potentially using sport instances
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-23-2021 02:24 PM
VACUUM has no effect on read/write performance to that table. Never running VACUUM on a table will not make read/write performance to a Delta Lake table any slower.
If you run VACUUM very infrequently, your VACUUM runtimes themselves may be pretty high, so it is suggested to run VACUUM somewhat regularly. How often you should run VACUUM depends on your storage costs.