cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

VACUUM seems to be deleting Autoloader's log files.

Menegat
New Contributor

Hello everyone,

I have a workflow setup that updates a few Delta tables incrementally with autoloader three times a day. Additionally, I run a separate workflow that performs VACUUM and OPTIMIZE on these tables once a week.

The issue I'm facing is that the first incremental workflow execution following the weekly optimization almost always fails with the following error message:

"Stream stopped... org.apache.spark.SparkException: Exception thrown in awaitResult: dbfs:/mnt/{PATH}/sources/0/rocksdb/logs/{FILE}.log."

This error refers to a log file that no longer exists. This issue doesn't occur with all tables, just with the larger ones.

Here are the properties of the tables where this error is happening:

TBLPROPERTIES (
"delta.autoOptimize.autoCompact" = "true",
"delta.enableChangeDataFeed" = "true",
"delta.autoOptimize.optimizeWrite" = "true",
"delta.columnMapping.mode" = "name",
"delta.deletedFileRetentionDuration" = "7 days",
"delta.logRetentionDuration" = "7 days",
"delta.minReaderVersion" = "2",
"delta.minWriterVersion" = "5",
"delta.targetFileSize" = "128mb"
)

Has anyone experienced this kind of issue before? Any ideas on what might be causing this problem or suggestions for how to prevent it from happening?

Thanks in advance for your help!

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @Menegat, It seems you’re encountering an issue with your Delta tables during incremental updates.

Let’s dive into this and explore potential solutions.

  1. Delta Live Tables and Incremental Updates:

  2. Error Message and Log File:

    • The error message you’re encountering refers to a log file that no longer exists. This could be due to the weekly optimization process.
    • When you perform VACUUM and OPTIMIZE, it might delete old log files, causing issues during the next incremental update.
  3. Potential Causes and Solutions:

    • Log Retention Duration: Check if the log retention duration is set appropriately. If it’s too short, logs might get deleted before the next incremental update.
    • Optimization Impact: The optimization process could be affecting the stream. Consider adjusting the timing of your workflows to avoid conflicts.
    • Table Size: You mentioned this issue occurs mainly with larger tables. Larger tables might have more complex optimization requirements. Consider adjusting the optimization settings or splitting large tables into smaller ones.
    • Spark Configuration: Ensure that your Spark configuration is optimized for your workload. You can adjust parameters related to memory, parallelism, and resource allocation.
    • Dynamic Overwrite: If you’re overwriting partitions during updates, consider using dynamic overwrite...1.
  4. Next Steps:

    • Review the above points and check if any adjustments are needed in your setup.
    • Monitor the behaviour after making changes to see if the issue persists.

Remember that Delta Live Tables are powerful but require careful configuration to ensure smooth incremental updates. Good luck! 🚀