Is there any impact on data ingestion and data extract while REORG TABLE is in progress
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-14-2024 11:36 PM
While using deltalake for eventing system, with repeated updates and merges etc, we are using deletion vector to improve performance. With that comes "REORG TABLE" maintenance task.
My question is in a ingestion and extract heavy system, when we conduct sequence of REORG TABLE -> OPTIMIZE -> VACUUM tasks, is there any impact of it on event-data ingestion into deltalake, and also correspondingly is there any impact on data extraction from deltalake?
Are we expected to pause/backoff ingestion and data extract while REORG TABLE->OPTIMIZE->VACUUM is in progress, considering REORG TABLE is essentially partition rewrites excluding soft-deleted data from a partition?
We are using delta 3.2 version, spark 3.5
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-15-2024 08:11 AM
It is advisable to schedule REORG TABLE
operations during periods of low activity to minimize disruptions to both data ingestion and extraction processes.
This can potentially affect ongoing data ingestion processes because the table's underlying files are being modified. If new data is being ingested into the table during this time, there might be conflicts or delays as the system manages the reorganization and ingestion simultaneously

