Hello everyone,
I have a workflow setup that updates a few Delta tables incrementally with autoloader three times a day. Additionally, I run a separate workflow that performs VACUUM and OPTIMIZE on these tables once a week.
The issue I'm facing is that the first incremental workflow execution following the weekly optimization almost always fails with the following error message:
"Stream stopped... org.apache.spark.SparkException: Exception thrown in awaitResult: dbfs:/mnt/{PATH}/sources/0/rocksdb/logs/{FILE}.log."
This error refers to a log file that no longer exists. This issue doesn't occur with all tables, just with the larger ones.
Here are the properties of the tables where this error is happening:
TBLPROPERTIES (
"delta.autoOptimize.autoCompact" = "true",
"delta.enableChangeDataFeed" = "true",
"delta.autoOptimize.optimizeWrite" = "true",
"delta.columnMapping.mode" = "name",
"delta.deletedFileRetentionDuration" = "7 days",
"delta.logRetentionDuration" = "7 days",
"delta.minReaderVersion" = "2",
"delta.minWriterVersion" = "5",
"delta.targetFileSize" = "128mb"
)
Has anyone experienced this kind of issue before? Any ideas on what might be causing this problem or suggestions for how to prevent it from happening?
Thanks in advance for your help!