04-05-2022 06:13 AM
Greetings,
I have been reading the excellent article from https://docs.databricks.com/security/privacy/gdpr-delta.html?_ga=2.130942095.1400636634.1649068106-1... and basically my question is if the GDPR DELETEs are performed on the table and that is the only change is it required to run OPTIMIZE ZSORT again on the table or the ZORDERing is maintained?
Thanks in advance for your help,
Cristian
04-05-2022 06:15 AM
After GDPR DELETE, please run VACUUM;
04-05-2022 06:16 AM
@Hubert Dudek thanks for the hint, exactly as written in the article VACUUM is required after the GDPR delete operation, however do we need to OPTIMIZE ZSORT again the table or is the ordering maintained?
04-05-2022 06:22 AM
No, as it is not related to data storing but to performance. It is optimization after delete, but you don't need to do it after every delete. OPTIMIZE can run, for example, once per 24h as a night maintenance job.
04-05-2022 08:54 AM
Thanks again for answering.
In order to understand better the context imagine a really big table, that costs a lot to fully optimize zsort.
On this table we do GDPR deletes on certain partitions, potentially quite a lot of partitions, the question is if the only change to those partitions are the GDPR deletes is the OPTIMIZE ZSORT still required?
Optimization is never run fully on this table because of its size, we run it selectively only on changed partitions. For inserts and updates it is clear that the order is changing and the zsorting again is needed but what about deletes? Is delete also type of change that requires the optimization zsort to be executed again or not, and if so then why?
04-13-2022 05:57 AM
I did some more research and according to this entry https://docs.databricks.com/release-notes/runtime/10.4.html#insertion-order-tags-are-now-preserved-f... with DBR 10.4 LTS the zorder is kept in some cases:
"
The UPDATE and DELETE commands now preserve existing clustering information (including Z-ordering) for files that are updated or deleted. This is a best-effort approach and does not apply to cases when files are so small that they are combined during the update or delete.
"
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group