cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Vacuum Command runs without any retention period even though the retention period was set

bricks_2026
New Contributor

Hello

I am trying to do tests on Vacuum in pytest.

Command executed -> 

VACUUM unittest_mobi_edwhc_bul_replikation_001.t_bul_vacuum_experiment_1 RETAIN 0.05150017944444444 HOURS

But the Vacuum command is deleting all files and not considering the provided parameter -RETAIN 0.05150017944444444 HOURS

2026-03-21 23:22:34.965|NULL |NULL |VACUUM START |{retentionCheckEnabled -> false, defaultRetentionMillis -> 172800000, specifiedRetentionMillis -> 0}

Above is the table history which shows that specifiedRetentionMillis =0 .

Why does vacuum delete all the files and not consider the provided parameter ? Am I missing something ?

Thanks in advance for your support .

1 ACCEPTED SOLUTION

Accepted Solutions

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @bricks_2026,

In Databricks, the RETAIN num HOURS clause is interpreted as a whole number of hours, not as a fractional duration.

In your example:

VACUUM unittest_mobi_edwhc_bul_replikation_001.t_bul_vacuum_experiment_1
RETAIN 0.05150017944444444 HOURS

that 0.0515... gets reduced to 0 hours internally. With the retention safety check disabled (retentionCheckEnabled -> false), an effective retention of 0 hours means... specifiedRetentionMillis is logged as 0, and VACUUM is allowed to delete all files that are no longer referenced by the latest table state, regardless of age.

That’s why your DESCRIBE HISTORY shows:

{retentionCheckEnabled -> false,
 defaultRetentionMillis -> 172800000,
 specifiedRetentionMillis -> 0}

and why all eligible files are removed.

VACUUM respects your argument, but fractional hours are effectively truncated to whole hours. With the safety check turned off, RETAIN 0.0515 HOURS behaves the same as RETAIN 0 HOURS, which is why all old files get deleted.

If you want very small test windows, the supported minimum unit is 1 hour, so you’d use e.g.:

VACUUM my_table RETAIN 1 HOURS;

Appreciate this is not clearly documented but I can confirm this based on some internal research. Hope this helps.

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

View solution in original post

2 REPLIES 2

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @bricks_2026,

In Databricks, the RETAIN num HOURS clause is interpreted as a whole number of hours, not as a fractional duration.

In your example:

VACUUM unittest_mobi_edwhc_bul_replikation_001.t_bul_vacuum_experiment_1
RETAIN 0.05150017944444444 HOURS

that 0.0515... gets reduced to 0 hours internally. With the retention safety check disabled (retentionCheckEnabled -> false), an effective retention of 0 hours means... specifiedRetentionMillis is logged as 0, and VACUUM is allowed to delete all files that are no longer referenced by the latest table state, regardless of age.

That’s why your DESCRIBE HISTORY shows:

{retentionCheckEnabled -> false,
 defaultRetentionMillis -> 172800000,
 specifiedRetentionMillis -> 0}

and why all eligible files are removed.

VACUUM respects your argument, but fractional hours are effectively truncated to whole hours. With the safety check turned off, RETAIN 0.0515 HOURS behaves the same as RETAIN 0 HOURS, which is why all old files get deleted.

If you want very small test windows, the supported minimum unit is 1 hour, so you’d use e.g.:

VACUUM my_table RETAIN 1 HOURS;

Appreciate this is not clearly documented but I can confirm this based on some internal research. Hope this helps.

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

bricks_2026
New Contributor

Hi Ashwin 

Many thanks for your answer. I also tested it in local pytest yesterday. The minimum supported value is 1 hour.