cancel
Showing results for 
Search instead for 
Did you mean: 
Data Governance
Join discussions on data governance practices, compliance, and security within the Databricks Community. Exchange strategies and insights to ensure data integrity and regulatory compliance.
cancel
Showing results for 
Search instead for 
Did you mean: 

GDPR/CCPA Compliance Delete for PII data

abhijit007
Databricks Partner

Hi,

I am currently designing a PII governance framework to meet CCPA compliance requirements on Databricks. I understand that Databricks provides mechanisms such as VACUUM and Deletion Vectors combined with REORG … APPLY (PURGE) to permanently remove data. With a well‑designed deletion workflow across the Lakeflow / medallion architecture, end‑to‑end PII deletion can be achieved.

However, I would like to understand whether Databricks offers any native feature, service, or managed capability that can reduce the operational overhead of implementing and maintaining this workflow, and help centrally orchestrate and enforce PII deletions across the entire lakehouse, rather than relying primarily on custom pipelines and control tables.

Thanks in advance and really appreciate your response. 

1 ACCEPTED SOLUTION

Accepted Solutions

aleksandra_ch
Databricks Employee
Databricks Employee

Hi @abhijit007 ,

A new Data Classification feature (currently in Public Preview), allows to automatically classify and tag sensitive data in your catalog. It goes through few steps:

  1. AI-driven engine scans Unity Catalog tables and detects PII data and assigns classification tags;
  2. Results of classification are stored in a system table system.data_classification.results;
  3. You can leverage ABAC policies using those tags to mask/filter PII;
  4. Leverage the system table to automatically remove GDPR data.

Check for more details:

Best regards,

View solution in original post

4 REPLIES 4

Sumit_7
Honored Contributor

Hi @abhijit007,

No, Databricks still does NOT provide a native, centralized PII deletion orchestration service across the lakehouse. Though as you mentioned right, it's achievable through custom pipelines and control tables.
Check this - Prepare your data for GDPR compliance | Databricks on AWS

Thanks.

abhijit007
Databricks Partner

Hi @Sumit_7 ,

Thanks for the details. It's helpful.

aleksandra_ch
Databricks Employee
Databricks Employee

Hi @abhijit007 ,

A new Data Classification feature (currently in Public Preview), allows to automatically classify and tag sensitive data in your catalog. It goes through few steps:

  1. AI-driven engine scans Unity Catalog tables and detects PII data and assigns classification tags;
  2. Results of classification are stored in a system table system.data_classification.results;
  3. You can leverage ABAC policies using those tags to mask/filter PII;
  4. Leverage the system table to automatically remove GDPR data.

Check for more details:

Best regards,

Hi @aleksandra_ch ,

Thanks .. The notebook reference is helpful.