VACUUM seems to be deleting Autoloader's log files.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-06-2024 07:13 AM
Hello everyone,
I have a workflow setup that updates a few Delta tables incrementally with autoloader three times a day. Additionally, I run a separate workflow that performs VACUUM and OPTIMIZE on these tables once a week.
The issue I'm facing is that the first incremental workflow execution following the weekly optimization almost always fails with the following error message:
"Stream stopped... org.apache.spark.SparkException: Exception thrown in awaitResult: dbfs:/mnt/{PATH}/sources/0/rocksdb/logs/{FILE}.log."
This error refers to a log file that no longer exists. This issue doesn't occur with all tables, just with the larger ones.
Here are the properties of the tables where this error is happening:
TBLPROPERTIES (
"delta.autoOptimize.autoCompact" = "true",
"delta.enableChangeDataFeed" = "true",
"delta.autoOptimize.optimizeWrite" = "true",
"delta.columnMapping.mode" = "name",
"delta.deletedFileRetentionDuration" = "7 days",
"delta.logRetentionDuration" = "7 days",
"delta.minReaderVersion" = "2",
"delta.minWriterVersion" = "5",
"delta.targetFileSize" = "128mb"
)
Has anyone experienced this kind of issue before? Any ideas on what might be causing this problem or suggestions for how to prevent it from happening?
Thanks in advance for your help!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-04-2024 02:49 PM
The error message suggests that autoloader's state is being improperly deleted, most likely by a separate process. If your checkpoint exists inside of the root of a delta table, then VACUUM can delete its files. Make sure that you do not store checkpoints inside of delta table locations.
Otherwise, you may want to enable storage logging to get more information about how the files are being deleted.