cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

VACUUM seems to be deleting Autoloader's log files.

Menegat
New Contributor

Hello everyone,

I have a workflow setup that updates a few Delta tables incrementally with autoloader three times a day. Additionally, I run a separate workflow that performs VACUUM and OPTIMIZE on these tables once a week.

The issue I'm facing is that the first incremental workflow execution following the weekly optimization almost always fails with the following error message:

"Stream stopped... org.apache.spark.SparkException: Exception thrown in awaitResult: dbfs:/mnt/{PATH}/sources/0/rocksdb/logs/{FILE}.log."

This error refers to a log file that no longer exists. This issue doesn't occur with all tables, just with the larger ones.

Here are the properties of the tables where this error is happening:

TBLPROPERTIES (
"delta.autoOptimize.autoCompact" = "true",
"delta.enableChangeDataFeed" = "true",
"delta.autoOptimize.optimizeWrite" = "true",
"delta.columnMapping.mode" = "name",
"delta.deletedFileRetentionDuration" = "7 days",
"delta.logRetentionDuration" = "7 days",
"delta.minReaderVersion" = "2",
"delta.minWriterVersion" = "5",
"delta.targetFileSize" = "128mb"
)

Has anyone experienced this kind of issue before? Any ideas on what might be causing this problem or suggestions for how to prevent it from happening?

Thanks in advance for your help!

1 REPLY 1

cgrant
Databricks Employee
Databricks Employee

The error message suggests that autoloader's state is being improperly deleted, most likely by a separate process. If your checkpoint exists inside of the root of a delta table, then VACUUM can delete its files. Make sure that you do not store checkpoints inside of delta table locations.

Otherwise, you may want to enable storage logging to get more information about how the files are being deleted.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group