cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Is there any impact on data ingestion and data extract while REORG TABLE is in progress

SagarJi
New Contributor II

While using deltalake for eventing system, with repeated updates and merges etc, we are using deletion vector to improve performance. With that comes "REORG TABLE" maintenance task.
My question is in a ingestion and extract heavy system, when we conduct sequence of REORG TABLE -> OPTIMIZE -> VACUUM tasks, is there any impact of it on event-data ingestion into deltalake, and also correspondingly is there any impact on data extraction from deltalake?
Are we expected to pause/backoff ingestion and data extract while REORG TABLE->OPTIMIZE->VACUUM is in progress, considering REORG TABLE is essentially partition rewrites excluding soft-deleted data from a partition?
We are using delta 3.2 version, spark 3.5

1 REPLY 1

Walter_C
Databricks Employee
Databricks Employee

It is advisable to schedule REORG TABLE operations during periods of low activity to minimize disruptions to both data ingestion and extraction processes.

This can potentially affect ongoing data ingestion processes because the table's underlying files are being modified. If new data is being ingested into the table during this time, there might be conflicts or delays as the system manages the reorganization and ingestion simultaneously

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group