cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

VACUUM seems to be deleting Autoloader's log files.

Menegat
New Contributor

Hello everyone,

I have a workflow setup that updates a few Delta tables incrementally with autoloader three times a day. Additionally, I run a separate workflow that performs VACUUM and OPTIMIZE on these tables once a week.

The issue I'm facing is that the first incremental workflow execution following the weekly optimization almost always fails with the following error message:

"Stream stopped... org.apache.spark.SparkException: Exception thrown in awaitResult: dbfs:/mnt/{PATH}/sources/0/rocksdb/logs/{FILE}.log."

This error refers to a log file that no longer exists. This issue doesn't occur with all tables, just with the larger ones.

Here are the properties of the tables where this error is happening:

TBLPROPERTIES (
"delta.autoOptimize.autoCompact" = "true",
"delta.enableChangeDataFeed" = "true",
"delta.autoOptimize.optimizeWrite" = "true",
"delta.columnMapping.mode" = "name",
"delta.deletedFileRetentionDuration" = "7 days",
"delta.logRetentionDuration" = "7 days",
"delta.minReaderVersion" = "2",
"delta.minWriterVersion" = "5",
"delta.targetFileSize" = "128mb"
)

Has anyone experienced this kind of issue before? Any ideas on what might be causing this problem or suggestions for how to prevent it from happening?

Thanks in advance for your help!

1 REPLY 1

cgrant
Databricks Employee
Databricks Employee

The error message suggests that autoloader's state is being improperly deleted, most likely by a separate process. If your checkpoint exists inside of the root of a delta table, then VACUUM can delete its files. Make sure that you do not store checkpoints inside of delta table locations.

Otherwise, you may want to enable storage logging to get more information about how the files are being deleted.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now