When dealing with GDPR requests in databricks, there are some essential things to keep in mind:
- Use a low retention period to ensure you don't keep table delta version history for tables with personal information.
- Use APPLY CHANGES to handle Slowly Changing Dimension type 1. This way, you won't track history (like in type 2) and will have it in a separate table.
- When handling customer insertion and GDPR requests, use a changed data feed in databricks. Ensure the table is declared as LIVE, not STREAM, to ensure complete data reload and avoid records for which we have received GDPR requests.