- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-05-2022 01:56 PM
Below is the integrity check error we are getting when trying to set the deletedRetentionFileDuration table property to 10 days.
Observation: The table data is sitting in S3. The size of all the files in S3 is in TB. There are millions of files for this table.
What is the best way to clear out the error apart from dropping and recreating the table?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-06-2022 12:11 AM
This might be because of issues in transaction logs. Since this is an external table and delta format, create or replace table should be fixing the issue. This should fix the transaction log issue.
However if the issue still persists you can contact databricks support or set this confg to skip the issue.
spark.conf.set("spark.databricks.delta.state.corruptionIsFatal", False)
Hope this helps..
Cheers.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-06-2022 03:20 AM
Please backup your table, then run the repair of files
FSCK REPAIR TABLE table_nameyou can also try to make dry run first
FSCK REPAIR TABLE table_name DRY RUNif data is partitioned can be helpful to refresh metastore
MSCK REPAIR TABLE mytable
My blog: https://databrickster.medium.com/