3 weeks ago
Hi,
I am currently designing a PII governance framework to meet CCPA compliance requirements on Databricks. I understand that Databricks provides mechanisms such as VACUUM and Deletion Vectors combined with REORG โฆ APPLY (PURGE) to permanently remove data. With a wellโdesigned deletion workflow across the Lakeflow / medallion architecture, endโtoโend PII deletion can be achieved.
However, I would like to understand whether Databricks offers any native feature, service, or managed capability that can reduce the operational overhead of implementing and maintaining this workflow, and help centrally orchestrate and enforce PII deletions across the entire lakehouse, rather than relying primarily on custom pipelines and control tables.
Thanks in advance and really appreciate your response.
3 weeks ago
Hi @abhijit007 ,
A new Data Classification feature (currently in Public Preview), allows to automatically classify and tag sensitive data in your catalog. It goes through few steps:
system.data_classification.results;Check for more details:
Best regards,
3 weeks ago
Hi @abhijit007,
No, Databricks still does NOT provide a native, centralized PII deletion orchestration service across the lakehouse. Though as you mentioned right, it's achievable through custom pipelines and control tables.
Check this - Prepare your data for GDPR compliance | Databricks on AWS
Thanks.
3 weeks ago
Hi @Sumit_7 ,
Thanks for the details. It's helpful.
3 weeks ago
Hi @abhijit007 ,
A new Data Classification feature (currently in Public Preview), allows to automatically classify and tag sensitive data in your catalog. It goes through few steps:
system.data_classification.results;Check for more details:
Best regards,
3 weeks ago
Hi @aleksandra_ch ,
Thanks .. The notebook reference is helpful.