cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Is there a way to permanenetly purge data in Databricks based on certain condition ?

soumend7115
New Contributor

Is there a way to permanenetly purge data in Databricks based on certain condition ?

Like, from a particular Databricks table, I want to permanently purge certain rows based on a specific condition e.g., WHERE <col1>="Val1" and <col2>="Val2"

1 REPLY 1

Lucas_TBrabo
Databricks Employee
Databricks Employee

Hi @soumend7115!

I will assume you are talking about managed tables in Unity Catalog here, if thats not the case, let me know.

We can segregate this in two steps:

  1. You can use a DELETE FROM SQL statement to remove rows that match your condition. For example: 
    DELETE FROM table_name WHERE col1 = 'Val1' AND col2 = 'Val2'
  2. After running the delete command, the data files are retained a retention period (default is 7 days) to support time travel and rollback. In order to permanently purge the data you can run the VACUUM command. For example:

    VACUUM table_name RETAIN 0 HOURS

    This physically deletes unreferenced files immediately. CAUTION: Setting retention to zero hours disables time travel for those files and should only be used if you're certain the data must be irrecoverable

Worth mentioning: If you have Deletion Vectors enabled in the table, you may need to run a REORG TABLE command.

Documentation reference:

 

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now