cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Managed Delta table: time travel blocked after automatic VACUUM

vidya_kothavale
Contributor

Hi,

On a managed Delta table  I get:

SELECT * FROM abc VERSION AS OF 25;

Error:

DELTA_UNSUPPORTED_TIME_TRAVEL_BEYOND_DELETED_FILE_RETENTION_DURATION
Cannot time travel beyond delta.deletedFileRetentionDuration (168 HOURS).

Audit logs show VACUUM START/END executed by a service principal (GUID userName); I never ran VACUUM manually. Table properties don’t explicitly set delta.deletedFileRetentionDuration, and Predictive Optimization is enabled (inherited).

Questions:

  1. What Databricks feature/job is triggering this automatic VACUUM on managed tables?
  2. How can I override/disable this for specific tables (e.g. increase retention or opt out)?
  3. Once this error appears, is recovery of that old version only possible (its managed table)?
    How should I back up a managed Delta table so I can recover older versions even after VACUUM (e.g. copy table)?
    Also, in my workspace only one managed table is hitting the  time‑travel error, while other tables are fine. Why would automatic VACUUM/retention affect this single table but not the others?
4 ACCEPTED SOLUTIONS

Accepted Solutions

szymon_dybczak
Esteemed Contributor III

Hi @vidya_kothavale ,

1. The feature is called preditive optimization for manged table. Predictive optimization runs the following operations on Unity Catalog managed tables:

- OPTIMIZE

- VACCUM

- ANALYZE

You can read more here:

Predictive optimization for Unity Catalog managed tables - Azure Databricks | Microsoft Learn

 

2. You can disable predictive optimization for a catalog or schema in following way:

Predictive optimization for Unity Catalog managed tables | Databricks on AWS

ALTER CATALOG [catalog_name] { ENABLE | DISABLE | INHERIT } PREDICTIVE OPTIMIZATION;
ALTER { SCHEMA | DATABASE } schema_name { ENABLE | DISABLE | INHERIT } PREDICTIVE OPTIMIZATION;

And if you want to increase retention period you can use following query:

ALTER TABLE table_name SET TBLPROPERTIES ('delta.deletedFileRetentionDuration' = '30 days');

 

 3. You can't do much since VACUUM removes all files from the table directory that are not managed by Delta, as well as data files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold. Of course you can try restore deleted files using cloud provider native options.
For instance if you have enabled soft deletes on Azure Storage you can try to use that. 

If the answer was helpful, please consider marking it as accepted solution

View solution in original post

balajij8
Contributor III

Hi

The error DELTA_UNSUPPORTED_TIME_TRAVEL_BEYOND_DELETED_FILE_RETENTION_DURATION confirms that the underlying files required for Version 25 have been deleted from the storage. Since the metadata knows those files should be there but finds them gone, it blocks the query.

1. What Databricks feature/job is triggering this automatic VACUUM on managed tables?

The service principal in the logs is Databricks Service executing Predictive Optimization automatically. Predictive Optimization is the standard for Unity Catalog managed tables that automatically handles OPTIMIZE and VACUUM operations in the background using serverless compute. It targets tables where it detects high file fragmentation or a build up of expired snapshots to maintain performance and to reduce storage costs.

2. How can I override/disable this for specific tables (e.g. increase retention or opt out)?

You can 

  • Increase Retention - Delta keeps 7 days (168 hours) of history. You can increase it using below
    ALTER TABLE eud_poland.staging.pibb_extract_preprocessed 
    SET TBLPROPERTIES ('delta.deletedFileRetentionDuration' = '30 days');
  • Opt out - You can disable the service for a specific catalog or schema. More details here

    You must manually manage the tables for optimizations to avoid performance degradation if you disable it.

3. is recovery of that old version only possible (its managed table)?

No. Once VACUUM is complete and the files are deleted from the storage, the old state of the data is gone.

4. How should I back up a managed Delta table so I can recover older versions even after VACUUM (e.g. copy table)?

Time Travel is not a long-term backup solution. You can use Delta Deep Clone for long term backup

  • Deep Clone - You can create a separate physical copy of the data and metadata.

    CREATE TABLE eud_poland.staging.pibb_extract_backup
    DEEP CLONE eud_poland.staging.pibb_extract_preprocessed;

5. Why is only this table affected?

Predictive Optimization does not hit every table with the same frequency. This specific table is targeted because:

  1. You are doing frequent operations on this table creating many files that trigger the optimization threshold.

  2. Table Size/Growth: The service prioritizes tables where storage savings or performance gains are most significant.

View solution in original post

Hi @vidya_kothavale ,

You can disable predictive optimization for an account, a catalog, or a schema. All Unity Catalog managed tables inherit the account value by default. You can override the account default at the catalog or schema level.

To disable it for your account to below steps:

Predictive optimization for Unity Catalog managed tables - Azure Databricks | Microsoft Learn

"An account admin can enable predictive optimization for all metastores in an account. Catalogs and schemas inherit this setting by default, but you can override it at either level.

  1. Go to the accounts console.
  2. Navigate to Settings, then Feature enablement.
  3. Select the option you want (for example, Enabled) next to Predictive optimization."

 

If the answer was helpful, please consider marking it as accepted solution

View solution in original post

balajij8
Contributor III

VACUUM will never delete files on the latest version even if Version 10 was not accessed or modified as it represents the current state of the table. VACUUM targets files that are not referenced by the recent version. It identifies files that were removed (due to DELETE/UPDATE etc in Versions 0 - 9) and if those specific files are not part of Version 10 and their deletion timestamp in the Log is older than the 7 day retention threshold, they are permanently deleted.

View solution in original post

6 REPLIES 6

szymon_dybczak
Esteemed Contributor III

Hi @vidya_kothavale ,

1. The feature is called preditive optimization for manged table. Predictive optimization runs the following operations on Unity Catalog managed tables:

- OPTIMIZE

- VACCUM

- ANALYZE

You can read more here:

Predictive optimization for Unity Catalog managed tables - Azure Databricks | Microsoft Learn

 

2. You can disable predictive optimization for a catalog or schema in following way:

Predictive optimization for Unity Catalog managed tables | Databricks on AWS

ALTER CATALOG [catalog_name] { ENABLE | DISABLE | INHERIT } PREDICTIVE OPTIMIZATION;
ALTER { SCHEMA | DATABASE } schema_name { ENABLE | DISABLE | INHERIT } PREDICTIVE OPTIMIZATION;

And if you want to increase retention period you can use following query:

ALTER TABLE table_name SET TBLPROPERTIES ('delta.deletedFileRetentionDuration' = '30 days');

 

 3. You can't do much since VACUUM removes all files from the table directory that are not managed by Delta, as well as data files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold. Of course you can try restore deleted files using cloud provider native options.
For instance if you have enabled soft deletes on Azure Storage you can try to use that. 

If the answer was helpful, please consider marking it as accepted solution

 @szymon_dybczak 
 I  want to disbled this property on workspace level then how I can do?    

Hi @vidya_kothavale ,

You can disable predictive optimization for an account, a catalog, or a schema. All Unity Catalog managed tables inherit the account value by default. You can override the account default at the catalog or schema level.

To disable it for your account to below steps:

Predictive optimization for Unity Catalog managed tables - Azure Databricks | Microsoft Learn

"An account admin can enable predictive optimization for all metastores in an account. Catalogs and schemas inherit this setting by default, but you can override it at either level.

  1. Go to the accounts console.
  2. Navigate to Settings, then Feature enablement.
  3. Select the option you want (for example, Enabled) next to Predictive optimization."

 

If the answer was helpful, please consider marking it as accepted solution

balajij8
Contributor III

Hi

The error DELTA_UNSUPPORTED_TIME_TRAVEL_BEYOND_DELETED_FILE_RETENTION_DURATION confirms that the underlying files required for Version 25 have been deleted from the storage. Since the metadata knows those files should be there but finds them gone, it blocks the query.

1. What Databricks feature/job is triggering this automatic VACUUM on managed tables?

The service principal in the logs is Databricks Service executing Predictive Optimization automatically. Predictive Optimization is the standard for Unity Catalog managed tables that automatically handles OPTIMIZE and VACUUM operations in the background using serverless compute. It targets tables where it detects high file fragmentation or a build up of expired snapshots to maintain performance and to reduce storage costs.

2. How can I override/disable this for specific tables (e.g. increase retention or opt out)?

You can 

  • Increase Retention - Delta keeps 7 days (168 hours) of history. You can increase it using below
    ALTER TABLE eud_poland.staging.pibb_extract_preprocessed 
    SET TBLPROPERTIES ('delta.deletedFileRetentionDuration' = '30 days');
  • Opt out - You can disable the service for a specific catalog or schema. More details here

    You must manually manage the tables for optimizations to avoid performance degradation if you disable it.

3. is recovery of that old version only possible (its managed table)?

No. Once VACUUM is complete and the files are deleted from the storage, the old state of the data is gone.

4. How should I back up a managed Delta table so I can recover older versions even after VACUUM (e.g. copy table)?

Time Travel is not a long-term backup solution. You can use Delta Deep Clone for long term backup

  • Deep Clone - You can create a separate physical copy of the data and metadata.

    CREATE TABLE eud_poland.staging.pibb_extract_backup
    DEEP CLONE eud_poland.staging.pibb_extract_preprocessed;

5. Why is only this table affected?

Predictive Optimization does not hit every table with the same frequency. This specific table is targeted because:

  1. You are doing frequent operations on this table creating many files that trigger the optimization threshold.

  2. Table Size/Growth: The service prioritizes tables where storage savings or performance gains are most significant.

If a Delta table has 10 historical versions and none of them have been modified or referenced in the last 7 days (the retention period), when VACCUM runs, does it delete all versions and their files, or does it keep the latest version and only delete older unused files?

balajij8
Contributor III

VACUUM will never delete files on the latest version even if Version 10 was not accessed or modified as it represents the current state of the table. VACUUM targets files that are not referenced by the recent version. It identifies files that were removed (due to DELETE/UPDATE etc in Versions 0 - 9) and if those specific files are not part of Version 10 and their deletion timestamp in the Log is older than the 7 day retention threshold, they are permanently deleted.