Columns archive_time, commit_time, archive_time always NULL when running cloud_files_state
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-27-2023 12:31 AM
Am attempting to find the commit_time for a given file for a delta table using the cloud_files_state command. However, the archive_time, commit_time, and archive_time coluns are always NULL. I am running databrics runtime 11.3 and have also verified with runtime version 13.0ML.
The issue has also been adressed in the following post: https://community.databricks.com/s/question/0D58Y00009gd0TDSAY/auto-loader-empty-fields-discoverytim...
Is this a bug? Is any fix available?
- Labels:
-
Cloud_files_state
-
CloudFiles
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-28-2023 10:57 AM
@Morten Stakkeland :
The issue you are facing with the cloud_files_state command is a known limitation in Delta Lake as of the latest stable release (Delta Lake 1.0). The commit_time and protocol columns are always null, and the archive_time column is also null for most files. This is because Delta Lake does not track commit_time and protocol for files written through the cloud storage API, and archive_time is only set when the file is actively being managed by Delta Lake's retention mechanism.
There is a feature request to address this limitation and provide more accurate commit_time and protocol information for files written through cloud storage APIs, but it is currently not implemented. You can track the status of this feature request in the Delta Lake Github repository. As for archive_time , if you need to track it for a specific file, you can use the delta.log method to inspect the commit history and find the commit that created or deleted the file. From there, you can use the versionAsOf method to read the table as it existed at that commit and inspect the archive_time column.

