cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

VACUUM seems to be deleting Autoloader's log files.

Menegat
New Contributor

Hello everyone,

I have a workflow setup that updates a few Delta tables incrementally with autoloader three times a day. Additionally, I run a separate workflow that performs VACUUM and OPTIMIZE on these tables once a week.

The issue I'm facing is that the first incremental workflow execution following the weekly optimization almost always fails with the following error message:

"Stream stopped... org.apache.spark.SparkException: Exception thrown in awaitResult: dbfs:/mnt/{PATH}/sources/0/rocksdb/logs/{FILE}.log."

This error refers to a log file that no longer exists. This issue doesn't occur with all tables, just with the larger ones.

Here are the properties of the tables where this error is happening:

TBLPROPERTIES (
"delta.autoOptimize.autoCompact" = "true",
"delta.enableChangeDataFeed" = "true",
"delta.autoOptimize.optimizeWrite" = "true",
"delta.columnMapping.mode" = "name",
"delta.deletedFileRetentionDuration" = "7 days",
"delta.logRetentionDuration" = "7 days",
"delta.minReaderVersion" = "2",
"delta.minWriterVersion" = "5",
"delta.targetFileSize" = "128mb"
)

Has anyone experienced this kind of issue before? Any ideas on what might be causing this problem or suggestions for how to prevent it from happening?

Thanks in advance for your help!

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @Menegat, It seems you’re encountering an issue with your Delta tables during incremental updates.

Let’s dive into this and explore potential solutions.

  1. Delta Live Tables and Incremental Updates:

  2. Error Message and Log File:

    • The error message you’re encountering refers to a log file that no longer exists. This could be due to the weekly optimization process.
    • When you perform VACUUM and OPTIMIZE, it might delete old log files, causing issues during the next incremental update.
  3. Potential Causes and Solutions:

    • Log Retention Duration: Check if the log retention duration is set appropriately. If it’s too short, logs might get deleted before the next incremental update.
    • Optimization Impact: The optimization process could be affecting the stream. Consider adjusting the timing of your workflows to avoid conflicts.
    • Table Size: You mentioned this issue occurs mainly with larger tables. Larger tables might have more complex optimization requirements. Consider adjusting the optimization settings or splitting large tables into smaller ones.
    • Spark Configuration: Ensure that your Spark configuration is optimized for your workload. You can adjust parameters related to memory, parallelism, and resource allocation.
    • Dynamic Overwrite: If you’re overwriting partitions during updates, consider using dynamic overwrite...1.
  4. Next Steps:

    • Review the above points and check if any adjustments are needed in your setup.
    • Monitor the behaviour after making changes to see if the issue persists.

Remember that Delta Live Tables are powerful but require careful configuration to ensure smooth incremental updates. Good luck! 🚀

 
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!