<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic VACUUM seems to be deleting Autoloader's log files. in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/vacuum-seems-to-be-deleting-autoloader-s-log-files/m-p/68273#M33620</link>
    <description>&lt;P&gt;Hello everyone,&lt;/P&gt;&lt;P&gt;I have a workflow setup that updates a few Delta tables incrementally with autoloader three times a day. Additionally, I run a separate workflow that performs VACUUM and OPTIMIZE on these tables once a week.&lt;/P&gt;&lt;P&gt;The issue I'm facing is that the first incremental workflow execution following the weekly optimization almost always fails with the following error message:&lt;/P&gt;&lt;P&gt;"Stream stopped... org.apache.spark.SparkException: Exception thrown in awaitResult: dbfs:/mnt/{PATH}/sources/0/rocksdb/logs/{FILE}.log."&lt;/P&gt;&lt;P&gt;This error refers to a log file that no longer exists. This issue doesn't occur with all tables, just with the larger ones.&lt;/P&gt;&lt;P&gt;Here are the properties of the tables where this error is happening:&lt;/P&gt;&lt;P&gt;&lt;EM&gt;TBLPROPERTIES (&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;"delta.autoOptimize.autoCompact" = "true",&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;"delta.enableChangeDataFeed" = "true",&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;"delta.autoOptimize.optimizeWrite" = "true",&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;"delta.columnMapping.mode" = "name",&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;"delta.deletedFileRetentionDuration" = "7 days",&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;"delta.logRetentionDuration" = "7 days",&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;"delta.minReaderVersion" = "2",&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;"delta.minWriterVersion" = "5",&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;"delta.targetFileSize" = "128mb"&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;)&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;Has anyone experienced this kind of issue before? Any ideas on what might be causing this problem or suggestions for how to prevent it from happening?&lt;/P&gt;&lt;P&gt;Thanks in advance for your help!&lt;/P&gt;</description>
    <pubDate>Mon, 06 May 2024 14:13:46 GMT</pubDate>
    <dc:creator>Menegat</dc:creator>
    <dc:date>2024-05-06T14:13:46Z</dc:date>
    <item>
      <title>VACUUM seems to be deleting Autoloader's log files.</title>
      <link>https://community.databricks.com/t5/data-engineering/vacuum-seems-to-be-deleting-autoloader-s-log-files/m-p/68273#M33620</link>
      <description>&lt;P&gt;Hello everyone,&lt;/P&gt;&lt;P&gt;I have a workflow setup that updates a few Delta tables incrementally with autoloader three times a day. Additionally, I run a separate workflow that performs VACUUM and OPTIMIZE on these tables once a week.&lt;/P&gt;&lt;P&gt;The issue I'm facing is that the first incremental workflow execution following the weekly optimization almost always fails with the following error message:&lt;/P&gt;&lt;P&gt;"Stream stopped... org.apache.spark.SparkException: Exception thrown in awaitResult: dbfs:/mnt/{PATH}/sources/0/rocksdb/logs/{FILE}.log."&lt;/P&gt;&lt;P&gt;This error refers to a log file that no longer exists. This issue doesn't occur with all tables, just with the larger ones.&lt;/P&gt;&lt;P&gt;Here are the properties of the tables where this error is happening:&lt;/P&gt;&lt;P&gt;&lt;EM&gt;TBLPROPERTIES (&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;"delta.autoOptimize.autoCompact" = "true",&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;"delta.enableChangeDataFeed" = "true",&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;"delta.autoOptimize.optimizeWrite" = "true",&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;"delta.columnMapping.mode" = "name",&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;"delta.deletedFileRetentionDuration" = "7 days",&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;"delta.logRetentionDuration" = "7 days",&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;"delta.minReaderVersion" = "2",&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;"delta.minWriterVersion" = "5",&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;"delta.targetFileSize" = "128mb"&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;)&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;Has anyone experienced this kind of issue before? Any ideas on what might be causing this problem or suggestions for how to prevent it from happening?&lt;/P&gt;&lt;P&gt;Thanks in advance for your help!&lt;/P&gt;</description>
      <pubDate>Mon, 06 May 2024 14:13:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/vacuum-seems-to-be-deleting-autoloader-s-log-files/m-p/68273#M33620</guid>
      <dc:creator>Menegat</dc:creator>
      <dc:date>2024-05-06T14:13:46Z</dc:date>
    </item>
    <item>
      <title>Re: VACUUM seems to be deleting Autoloader's log files.</title>
      <link>https://community.databricks.com/t5/data-engineering/vacuum-seems-to-be-deleting-autoloader-s-log-files/m-p/100985#M40501</link>
      <description>&lt;P&gt;The error message suggests that autoloader's state is being improperly deleted, most likely by a separate process. If your checkpoint exists inside of the root of a delta table, then VACUUM can delete its files. Make sure that you do not store checkpoints inside of delta table locations.&lt;/P&gt;
&lt;P&gt;Otherwise, you may want to enable storage logging to get more information about how the files are being deleted.&lt;/P&gt;</description>
      <pubDate>Wed, 04 Dec 2024 22:49:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/vacuum-seems-to-be-deleting-autoloader-s-log-files/m-p/100985#M40501</guid>
      <dc:creator>cgrant</dc:creator>
      <dc:date>2024-12-04T22:49:53Z</dc:date>
    </item>
  </channel>
</rss>

