<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Issue with DeltaFileNotFoundException After Vacuum and Missing Data Changes in Delta Log in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/issue-with-deltafilenotfoundexception-after-vacuum-and-missing/m-p/105695#M42242</link>
    <description>&lt;P&gt;Dear Databricks experts,&lt;/P&gt;&lt;P&gt;I encountered the following error in Databricks:&lt;/P&gt;&lt;P&gt;`com.databricks.sql.transaction.tahoe.DeltaFileNotFoundException: [DELTA_EMPTY_DIRECTORY] No file found in the directory: gs://cimb-prod-lakehouse/bronze-layer/losdb/pl_message/_delta_log.`&lt;/P&gt;&lt;P&gt;This issue occurred after running a **Vacuum** operation. Despite continuous data ingestion, I noticed that there were no changes reflected in the Delta log (`_delta_log`). This raises a few questions:&lt;/P&gt;&lt;P&gt;1. Why does the **Vacuum** operation delete essential files, such as those required for `_delta_log`, leading to this error?&lt;BR /&gt;2. How can data ingestion continue without updates being recorded in the Delta log?&lt;BR /&gt;3. Is there a way to ensure that necessary files are retained during Vacuum to avoid such issues?&lt;/P&gt;&lt;P&gt;Currently, I have managed to fix the issue by identifying the last valid version after the Vacuum process and reading from that version. Since I am using readChangeFeed. I can read from the latest version if a new issue arises. However, I would like to better understand the root cause and how to prevent this problem in the future.&lt;/P&gt;&lt;P&gt;Thank you for your guidance!&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="minhhung0507_2-1736940030237.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/14106i64AF7B35E431C465/image-size/medium?v=v2&amp;amp;px=400" role="button" title="minhhung0507_2-1736940030237.png" alt="minhhung0507_2-1736940030237.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 15 Jan 2025 11:20:39 GMT</pubDate>
    <dc:creator>minhhung0507</dc:creator>
    <dc:date>2025-01-15T11:20:39Z</dc:date>
    <item>
      <title>Issue with DeltaFileNotFoundException After Vacuum and Missing Data Changes in Delta Log</title>
      <link>https://community.databricks.com/t5/data-engineering/issue-with-deltafilenotfoundexception-after-vacuum-and-missing/m-p/105695#M42242</link>
      <description>&lt;P&gt;Dear Databricks experts,&lt;/P&gt;&lt;P&gt;I encountered the following error in Databricks:&lt;/P&gt;&lt;P&gt;`com.databricks.sql.transaction.tahoe.DeltaFileNotFoundException: [DELTA_EMPTY_DIRECTORY] No file found in the directory: gs://cimb-prod-lakehouse/bronze-layer/losdb/pl_message/_delta_log.`&lt;/P&gt;&lt;P&gt;This issue occurred after running a **Vacuum** operation. Despite continuous data ingestion, I noticed that there were no changes reflected in the Delta log (`_delta_log`). This raises a few questions:&lt;/P&gt;&lt;P&gt;1. Why does the **Vacuum** operation delete essential files, such as those required for `_delta_log`, leading to this error?&lt;BR /&gt;2. How can data ingestion continue without updates being recorded in the Delta log?&lt;BR /&gt;3. Is there a way to ensure that necessary files are retained during Vacuum to avoid such issues?&lt;/P&gt;&lt;P&gt;Currently, I have managed to fix the issue by identifying the last valid version after the Vacuum process and reading from that version. Since I am using readChangeFeed. I can read from the latest version if a new issue arises. However, I would like to better understand the root cause and how to prevent this problem in the future.&lt;/P&gt;&lt;P&gt;Thank you for your guidance!&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="minhhung0507_2-1736940030237.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/14106i64AF7B35E431C465/image-size/medium?v=v2&amp;amp;px=400" role="button" title="minhhung0507_2-1736940030237.png" alt="minhhung0507_2-1736940030237.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 15 Jan 2025 11:20:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/issue-with-deltafilenotfoundexception-after-vacuum-and-missing/m-p/105695#M42242</guid>
      <dc:creator>minhhung0507</dc:creator>
      <dc:date>2025-01-15T11:20:39Z</dc:date>
    </item>
    <item>
      <title>Re: Issue with DeltaFileNotFoundException After Vacuum and Missing Data Changes in Delta Log</title>
      <link>https://community.databricks.com/t5/data-engineering/issue-with-deltafilenotfoundexception-after-vacuum-and-missing/m-p/105708#M42249</link>
      <description>&lt;P&gt;&lt;SPAN&gt;The error you're encountering,&amp;nbsp;&lt;/SPAN&gt;com.databricks.sql.transaction.tahoe.DeltaFileNotFoundException: [DELTA_EMPTY_DIRECTORY] No file found in the directory: gs://cimb-prod-lakehouse/bronze-layer/losdb/pl_message/_delta_log&lt;SPAN&gt;, indicates that the&amp;nbsp;&lt;/SPAN&gt;_delta_log&lt;SPAN&gt;&amp;nbsp;directory is empty or missing, which is critical for Delta Lake operations. This issue can arise due to improper use of the&amp;nbsp;&lt;/SPAN&gt;VACUUM&lt;SPAN&gt;&amp;nbsp;operation.&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;The&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;VACUUM&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;operation in Delta Lake is used to remove old files that are no longer needed for the current state of the table. However, if the retention period is set too short, it can inadvertently delete files that are still needed for the Delta table's metadata and transaction log.&lt;/LI&gt;&lt;LI&gt;The default retention period for&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;VACUUM&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;is 7 days. If you set a shorter retention period, you risk deleting files that are still required.&lt;/LI&gt;&lt;LI&gt;If the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;_delta_log&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;directory is missing or corrupted, Delta Lake cannot properly record transactions. This can lead to inconsistencies and errors during data ingestion and querying.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;VACUUM my_table RETAIN 168 HOURS; -- Retain files for 7 days&lt;/P&gt;</description>
      <pubDate>Wed, 15 Jan 2025 13:23:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/issue-with-deltafilenotfoundexception-after-vacuum-and-missing/m-p/105708#M42249</guid>
      <dc:creator>saurabh18cs</dc:creator>
      <dc:date>2025-01-15T13:23:02Z</dc:date>
    </item>
    <item>
      <title>Re: Issue with DeltaFileNotFoundException After Vacuum and Missing Data Changes in Delta Log</title>
      <link>https://community.databricks.com/t5/data-engineering/issue-with-deltafilenotfoundexception-after-vacuum-and-missing/m-p/105824#M42273</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/22314"&gt;@saurabh18cs&lt;/a&gt;&amp;nbsp;,&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for your explanation regarding the VACUUM operation and the error I encountered. I appreciate your insights.&lt;/P&gt;&lt;P&gt;I would like to clarify further: why does the VACUUM feature sometimes delete files that are still necessary and being referenced? Is this behavior considered a bug, or is it an inherent aspect of how the VACUUM operation functions? Understanding this will help me better manage the retention period and prevent future issues.&lt;/P&gt;&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/34618"&gt;@VZLA&lt;/a&gt;&amp;nbsp;, I would appreciate it if you could let me know your thoughts on this matter.&lt;BR /&gt;&lt;BR /&gt;Thank you for your assistance!&lt;/P&gt;</description>
      <pubDate>Thu, 16 Jan 2025 03:41:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/issue-with-deltafilenotfoundexception-after-vacuum-and-missing/m-p/105824#M42273</guid>
      <dc:creator>minhhung0507</dc:creator>
      <dc:date>2025-01-16T03:41:11Z</dc:date>
    </item>
    <item>
      <title>Re: Issue with DeltaFileNotFoundException After Vacuum and Missing Data Changes in Delta Log</title>
      <link>https://community.databricks.com/t5/data-engineering/issue-with-deltafilenotfoundexception-after-vacuum-and-missing/m-p/105829#M42277</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/135091"&gt;@minhhung0507&lt;/a&gt;&amp;nbsp;,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;You must choose an interval that is longer than the longest running concurrent transaction and the longest period that any stream can lag behind the most recent update to the table. So that Vaccum&amp;nbsp;&amp;nbsp;tables cannot be corrupted when&amp;nbsp;&lt;SPAN class=""&gt;VACUUM&lt;/SPAN&gt;&amp;nbsp;deletes files that have not yet been committed. &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;And also there is a safety check to check whether&amp;nbsp;there are no operations being performed on this table that take longer than the retention interval you plan to specify, you can turn off/on this safety check by setting the Spark configuration property&amp;nbsp;&lt;SPAN class=""&gt;spark.databricks.delta.retentionDurationCheck.enabled&lt;/SPAN&gt;&amp;nbsp;to&amp;nbsp;&lt;SPAN class=""&gt;false.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Hope this helps!!!&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 16 Jan 2025 03:59:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/issue-with-deltafilenotfoundexception-after-vacuum-and-missing/m-p/105829#M42277</guid>
      <dc:creator>Avinash_Narala</dc:creator>
      <dc:date>2025-01-16T03:59:18Z</dc:date>
    </item>
    <item>
      <title>Re: Issue with DeltaFileNotFoundException After Vacuum and Missing Data Changes in Delta Log</title>
      <link>https://community.databricks.com/t5/data-engineering/issue-with-deltafilenotfoundexception-after-vacuum-and-missing/m-p/105854#M42288</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/135091"&gt;@minhhung0507&lt;/a&gt;,&lt;/P&gt;&lt;P&gt;The VACUUM command on a Delta table does not delete the _delta_log folder, as this folder contains all the metadata related to the Delta table. The _delta_log folder acts as a pointer where all changes are tracked. In the event that the _delta_log folder is accidentally deleted, it cannot be recovered unless bucket versioning enabled. If versioning is enabled, you can restore the deleted files and run the FSCK REPAIR command to fix the Delta table. However, it's important to understand how Delta performs the FSCK operation under the hood.&lt;/P&gt;&lt;P&gt;For more understanding on Vacuum, refer following link&amp;nbsp;&lt;A href="https://docs.gcp.databricks.com/en/sql/language-manual/delta-vacuum.html" target="_blank" rel="noopener"&gt;VACUUM | Databricks on Google Cloud&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If you are still facing issue to query table cause of missing parquet files, you can fix it by running following command, refer following link&amp;nbsp;&lt;A href="https://docs.gcp.databricks.com/en/sql/language-manual/delta-fsck.html" target="_blank" rel="noopener"&gt;FSCK REPAIR TABLE | Databricks on Google Cloud&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;FSCK REPAIR TABLE table_name [DRY RUN]&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards,&lt;BR /&gt;Hari Prasad&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 16 Jan 2025 09:36:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/issue-with-deltafilenotfoundexception-after-vacuum-and-missing/m-p/105854#M42288</guid>
      <dc:creator>hari-prasad</dc:creator>
      <dc:date>2025-01-16T09:36:04Z</dc:date>
    </item>
    <item>
      <title>Re: Issue with DeltaFileNotFoundException After Vacuum and Missing Data Changes in Delta Log</title>
      <link>https://community.databricks.com/t5/data-engineering/issue-with-deltafilenotfoundexception-after-vacuum-and-missing/m-p/105860#M42290</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/135091"&gt;@minhhung0507&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;behavior is not a bug but rather an inherent aspect of how the&amp;nbsp;&lt;/SPAN&gt;VACUUM&lt;SPAN&gt;&amp;nbsp;operation functions. VACUUM do not delete from _delta_log folder, this folder has its own default retention of 30 days:&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Delta Lake maintains a transaction log (_delta_log&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;directory) that records all changes to the table. This log ensures ACID transactions and allows for time travel and versioning.&lt;/LI&gt;&lt;LI&gt;The transaction log contains metadata about the files that make up the table at any given point in time.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;so upto you to decide how much time travel or versioning you want for your data as data files takes storage space into consideration so keeping default 7 days is good , anything greater than this incurs storage cost and lesser is the risk of having very less retention. matching to 30 days as _delta_logs is good but cost and your usecase applies here.&lt;/P&gt;</description>
      <pubDate>Thu, 16 Jan 2025 10:02:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/issue-with-deltafilenotfoundexception-after-vacuum-and-missing/m-p/105860#M42290</guid>
      <dc:creator>saurabh18cs</dc:creator>
      <dc:date>2025-01-16T10:02:58Z</dc:date>
    </item>
    <item>
      <title>Re: Issue with DeltaFileNotFoundException After Vacuum and Missing Data Changes in Delta Log</title>
      <link>https://community.databricks.com/t5/data-engineering/issue-with-deltafilenotfoundexception-after-vacuum-and-missing/m-p/106010#M42344</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/22314"&gt;@saurabh18cs&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Thanks for that very detailed explanation. I will take note and continue to observe this case.&lt;/P&gt;</description>
      <pubDate>Fri, 17 Jan 2025 03:21:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/issue-with-deltafilenotfoundexception-after-vacuum-and-missing/m-p/106010#M42344</guid>
      <dc:creator>minhhung0507</dc:creator>
      <dc:date>2025-01-17T03:21:10Z</dc:date>
    </item>
  </channel>
</rss>

