cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Is it required to run OPTIMIZE after doing GDPR DELETEs?

cristianc
Contributor

Greetings,

I have been reading the excellent article from https://docs.databricks.com/security/privacy/gdpr-delta.html?_ga=2.130942095.1400636634.1649068106-1... and basically my question is if the GDPR DELETEs are performed on the table and that is the only change is it required to run OPTIMIZE ZSORT again on the table or the ZORDERing is maintained?

Thanks in advance for your help,

Cristian

5 REPLIES 5

Hubert-Dudek
Esteemed Contributor III

After GDPR DELETE, please run VACUUM;

cristianc
Contributor

@Hubert Dudek​ thanks for the hint, exactly as written in the article VACUUM is required after the GDPR delete operation, however do we need to OPTIMIZE ZSORT again the table or is the ordering maintained?

Hubert-Dudek
Esteemed Contributor III

No, as it is not related to data storing but to performance. It is optimization after delete, but you don't need to do it after every delete. OPTIMIZE can run, for example, once per 24h as a night maintenance job.

Thanks again for answering.

In order to understand better the context imagine a really big table, that costs a lot to fully optimize zsort.

On this table we do GDPR deletes on certain partitions, potentially quite a lot of partitions, the question is if the only change to those partitions are the GDPR deletes is the OPTIMIZE ZSORT still required?

Optimization is never run fully on this table because of its size, we run it selectively only on changed partitions. For inserts and updates it is clear that the order is changing and the zsorting again is needed but what about deletes? Is delete also type of change that requires the optimization zsort to be executed again or not, and if so then why?

I did some more research and according to this entry https://docs.databricks.com/release-notes/runtime/10.4.html#insertion-order-tags-are-now-preserved-f... with DBR 10.4 LTS the zorder is kept in some cases:

"

The UPDATE and DELETE commands now preserve existing clustering information (including Z-ordering) for files that are updated or deleted. This is a best-effort approach and does not apply to cases when files are so small that they are combined during the update or delete.

"

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.