- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-14-2023 09:59 PM
df = (spark.readStream.format("delta")\
.option("readChangeFeed", "true")\
.option("startingVersion", 1)\
.table("CatalogName.SchemaName.TableName")
)
display(df)
A file referenced in the transaction log cannot be found. This occurs when data has been manually deleted from the file system rather than using the table `DELETE` statement.
When I see the path of the file, file is not present there. It was neither deleted manually nor Optimized.
Does delta table have any default setting to optimize the table? When I check the history of the table, few records of OPTIMIZE is there and after that OPTIMIZE I was not able to see previous versions of the table though the complete data is available but I cannot see the initial version data of the table.
- Labels:
-
Transaction Log
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-22-2023 05:08 PM
Had you run vacuum on the table? Vacuum can clean up data files marked for removal and are older than retention period.
Optimize compacts files and marks the small files for removal, but does not physically remove the data files
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-22-2023 05:08 PM
Had you run vacuum on the table? Vacuum can clean up data files marked for removal and are older than retention period.
Optimize compacts files and marks the small files for removal, but does not physically remove the data files

